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ABSTRACT 


The  high  resolution  direction-of-arrival  (DOA)  estimation  has  been  an 
important  area  ot  research  for  a  number  of  years.  Many  researchers  have  developed 
a  variety  of  algorithms  to  estimate  the  direction  of  arrival.  Another  important 
aspect  of  the  DOA  estimation  area  is  Ihe  development  of  high  speed  hardware 
capable  of  computing  the  DOA  in  real  time.  In  this  research  we  have  first  focussed 
on  the  development  of  parallel  architecture  for  multiple  signal  classification 
(MUSIC)  and  estimation  os  signal  parameters  by  rotational  invariance  technique 
(ESPRIT)  algorithms  for  the  narrow  band  sources.  These  algorithms  are  substituted 
with  computationaly  efficient  modules  and  converted  to  pipelined  and  parallel 
algorithms.  For  example  one  important  computation  of  eigendecomposition  of  the 
covariance  matrix  has  been  performed  using  Householders  transformations  and  QR 
method.  MUSIC/ESPRIT  algorithms  are  also  shown  in  detailed  pipelined 
flowchart.  Systolic  architectures  are  developed  for  both  MUSIC/ESPRIT  algorithms. 
Two  other  approaches  of  using  cordic  processors  and  single  instruction  multiple 
data  (SIMD)  machines  for  the  computation  of  MUSIC/ESPRIT  algorithms  are  also 
being  studied. 

The  second  part  of  this  research  is  to  perform  DOA  estimation  for  the 
wideband  sources.  Current  literature  is  being  studied  and  at  the  present  time  three 
algorithms  are  under  investigation.  For  one  of  the  three  algorithms  we  have 
proposed  a  computationally  simple  algorithm  and  its  flowchart  is  also  shown. 

Chapter  I  presents  theoretical  and  mathematical  aspects  of  MUSIC/  ESPRIT 
algorithms.  These  algorithms  are  modified  and  parallelized  and  described  in 
Chapter  II.  Hardware  implementations  of  tnese  algorithms  are  given  in  Chapter  III. 
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Cordic  Processor  approach  is  shown  in  Chapter  IV.  Wideband  case  is  presented  in 
chapter  V. 
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Chapter  I 


INTRODUCTION  TO  ARRAY  SIGNAL  PROCESSING 

1.1:  INTRODUCTION 


The  high  resolution  direction-of-arrival  (DOA)  estimation  is  important  in 
many  sensor  systems.  It  is  based  on  the  processing  of  the  received  signal  and 
extracting  the  desired  parameters  of  the  DOA  of  plane  waves.  Many  approaches 
have  been  used  for  the  purpose  of  implementing  the  function  required  for  the  DOA 
estimation  including  the  so  called  maximum  likelihood  (ML)  and  the  maximum 
entropy  (ME)  methods  [1-3].  Although  they  are  widely  used,  they  have  met  with 
only  moderate  success.  The  ML  method  yields  to  a  set  of  highly  non  linear 
equations,  while  the  ME  introduces  bias  and  sensibility  parameters  estimates  due  to 
use  of  uii  incorrc*...  *uode  (e.g.  AR  rathei  Ihuu  ARMA).  The  Multiple  Signal 
Classification  (MUSIC)  and  the  Estimation  of  Signal  Parameters  by  Rotational 
Invariance  techniques  (ESPRIT)  algorithms  are  two  novel  approaches  used  recently 
to  provide  asymptotically  unbiased  and  efficient  esiimatea  of  Ihc  DOA  i4][5].  They 
are  believed  to  be  the  most  promising  and  leading  candidates  for  further  study  and 
hardware  implementation  for  real  time  applications.  They  estimate  the  so  called 
signal  subspace  from  the  array  measurements.  The  parameters  of  interest  (i.e. 
determining  of  the  DOA)  are  then  estimated  from  the  intersection  between  the  array 
manifold  and  the  estimated  subspace. 

1.2:  DATA  MODEL 


For  the  purpose  of  understanding  the  advantages  of  using  a  sensor  array  in 


DOA  estimation,  it  is  necessary  to  explore  the  nature  ot  signals  and  noise  the  array  is 
desired  to  receive.  It  is  well  known  that  in  active  sensing  situations,  the  scattered 
data  tluctuates  ranciomly  about  a  true  value  representing  a  noise  free  signal.  This  is 
due  to  noise  effects  and  errors  in  a  sensor  array  system.  These  fluctuations  can  be 
both  additive  and  multiplicative.  The  additive  fluctuations  are  due  to  thermal 
noise,  shot  noise,  atmospheric  noise,  and  other  kinds  of  noise  which  are 
independent  of  the  desired  signal.  The  multiplicative  fluctuations  are  due  to 
measured  errors  in  estimating  the  signal  amplitudes,  gain  variation,  etc.  A  noise 
model  that  represents  all  these  noise  effects  is,  in  general,  difficult  to  obtain, 
especially  when  some  of  the  noise  sources  are  dominant.  Usually,  based  on  the 
noise  models,  additive  and/or  multiplicative,  the  calculated  probability  of  error,  as  a 
function  of  the  noise  power,  is  practically  similar  in  each  case.  This  indicates  that 
the  noise  power  rather  than  its  specific  characteristics,  has  more  impact  on  the 
sensor  array  performance.  Moreover,  one  is  usually  concerned  with  the  effects  of 
the  additive  noise  on  the  output  of  a  sensor  array  system.  For  this  reason,  an 
additi.e  noise  would  be  appropriate  to  choose  for  the  evaluation  of  the  performance 
of  a  system.  This  noise  represents  the  totality  of  small  independent  sources,  and  by 
virtue  of  the  central  limit  theorem  one  can  model  the  resulting  noise  as  Gaussian 
and  (usually)  stationary  process.  Also,  to  make  the  problem  analytically  tractable,  we 
focus  on  narrow  band  signals  where  it  is  assumed  that  the  power  of  all  emitter 
signals  is  concentrated  in  the  same  narrow  frequency  band.  In  this  context,  two 
more  assumptions  that  are  of  interest  are  invoked.  First,  we  assume  that  the 
sources  are  in  the  far  field  of  the  array,  consequently  the  radiation  impinging  on  the 
array  is  in  the  form  of  plane  waves,  and  secondly,  the  transmission  medium  is 
assumed  to  be  isotropic  so  that  the  radiation  propagates  in  straight  line.  Based  on 
these  assumptions,  the  output  of  any  array  element  can  be  represented  by  a  time 
advanced  version  or  time  delayed  version  of  the  received  signal  at  a  reference 


eleir.ent  as  shown  in  Figure  1.1. 


A  Sin  0 


Figure  1.1;  A  two  sensor  array 

Since  the  narrow-band  signals  are  assumed  to  have  the  same  known  frequency  co, 
the  received  signals  at  the  reference  sensor  and  the  second  sensor  are  respcctivelv 
given  by 

s(t)  =  u(t)  exp[  j(a)^,t  +  v(t))]  (1.1) 

s(t-T)  =  u(t-t)  exp[  (t  -  x)-!-  v(t- 1))]  (1.2) 

where  u(t),  and  v(t)  are  the  amplitude  and  phase  of  s(t)  respectively-  The  signal 
s(t-  t)  at  the  second  sensor  is  delayed  by  the  time  required  for  the  plane  wave  to 
propagate  through  A  sin  9,  and  if  c  represents  the  velocity  of  propagation,  then  this 
time  delay  x  is  given  by 

A  sin  0 

I  =  — -  (1.3) 

The  narrow  band  assumption  implies  that  u(t)  and  v(t)  are  slowly  varying 


functions,  thus; 
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functions,  thus: 

u(t)  =  u{t-i)  (1.4) 

v(t)  =  v(t-T:)  (1.5) 

for  all  possible  propagation  delays.  Thus  the  effect  of  a  time  delay  on  the  received 
signal  is  simply  a  phase  shift; 

s(t-i)=  s(t)exp(-j(dgi)  (1.6) 

.\ow  consider  an  array  consisting  of  m  sensors  and  receiving  signals  from  d  sources 
located  at  directions  0i  02, ...  0d  with  respect  to  the  line  of  array,  as  shown  in  Figure 


1.2. 
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Reference  element 

Figure  1.2:  Typical  array  scene. 


It  is  assumed  that  none  of  the  signals  are  coherent  (i.e.  1  pij  1  ).  Using 

superposition  of  signal  contribution,  the  received  signal  at  the  sensor  can  be 
written  as 

d 

^  =  X  ^  ^  k^®  i) )  +  n  ^(t) 

i=l 


d 

=  ^akOi)  s  i(t)  expf-jcOoXkOj) )  +  n^(t)  (1.7) 

i=l 


where  x  k(0  j)  is  the  propagation  delay  between  a  reference  point  and  the  k**’  sensor 
for  the  i^*^'  wavefront  impinging  on  the  array  from  direction  0i ,  ak(0i)  is  the 
corresponding  sensor  element  complex  response  (gain  and  phase)  at  frequency 
CO  o  and  nic(t)  stands  for  the  additive  noise  at  the  k^*’  sensor.  If  we  let 


(1.8) 


a(  0  j)  =  I  a  ,(0  i)exp(jco ,(0, ))....  a  ^(0  i)exp(jw ^(0  j))}“ 


Where  denotes  complex  conjugate  transpose 
and  n(t)  =  {  n  j(t),  nj/t) . 


the  data  model  representing  the  outputs  of  m  sensors  becomes 
d 

x(t)=^a(  0i)s  j(t) +n(t)  (1.9) 

i=l 


Now  by  setting 

A(0)=(a(0  i),a(02),...,a(0j))  (1.10) 


and 

s(t)  =  (  s  i(t),  s  2(0, ... ,  s  ^(t))^  (1.11) 


x(t)  can  be  rewritten  as 


x(t)  =  A(0)  s(t)  +  n(t)  (1.12) 


where 

x(t),  n(t)  e  C"’,  s(t)e  and  A(0)  € 


A(0)  is  called  the  direction  matrix.  The  columns  of  A(0)  are  elements  of  a  set, 
termed  the  array  manifold,  composed  of  all  array  response  vectors  obtained  as 
ranges  over  the  entire  space.  If  we  assume  that  signals  and  noise  are  stationary,  zero 
mean,  uncorrelated  random  processes  and  further  the  noises  in  different  sensors  are 
uncorrelated,  the  spatiul  correlation  matrix  of  the  observed  signal  vector  x(t)  is 
defined  by: 

R  =  E  (x(t)  x“(t))  (1.13) 


where  E  is  the  expectation  operator. 

The  substitution  of  Equation  (1.12)  into  (1.13)  gives 


R  ,,,  =  E  (  AO)  s(t)  s(t)“(A(e))”+  I 
=A(e)  R^3  A(0)^  +  a^I 


(1.14) 


where 

R33=E(s(t)s(t)”)  (1.15) 

and  g2.  I  is  the  spatial  correlation  matrix  of  the  noise  vector  n(t),  denotes  the 
variance  of  the  elemental  noise  njlt).  i  =  1, ...  n 


1.3;  MULTIPLE  SIGNAL  CLASSIHCATION  (MUSIC)  ALGORITHM 

Consider  first  the  noise  free  case  where 

d 

x(t)=^  a(ei)Si(t)  (1.16) 

i=l 

This  means  that  x(t)  is  a  linear  combination  of  the  d  steering  column  vectors  of  A(0) 
and  is  therefore  constrained  to  the  d-dimensional  subspace  of  Cm,,  termed  the  signal 
subspace,  that  is  spanned  by  the  d  columns  vectors  of  A(0).  In  this  case  the  signal 
subspace  intersects  the  array  manifold  at  the  d  steering  vectors  a(  0  i )  as  shown  in 
Figure  1.3. 
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Figure  1.3:  Signal  subspace  and  array  manifold  for  a  two-source  example. 

However,  when  the  data  is  corrupted  by  noise,  the  signal  subspace  has  to  be 

estimated  and  consequently  it  is  expected  that  the  signal  subspace  will  not  intersect 

the  array  manifold,  so  the  steering  vectors  closest  to  the  signal  subspace  will  be 

chosen  instead  [6].  In  the  following,  it  is  shown  that  one  set  of  d  independent 

vectors  that  span  the  signal  subspace  is  given  by  the  d  eigenvectors  corresponding  to 

the  d  largest  eigenvalues  of  the  data  covariance  matrix.  The  data  covariance  matrix 

is  assumed  to  be  positive  definite  and  Hermetian  and  consequently  its 

eigendecomposition  is  given  by 

R  =  E  A  e“  (E  e”  =I) 

R,,E  =  EA 


(A(e)R  ^^AO)  +  o'  I )  E  =  E  A 
A(e)R  A(e)”  E  =  E  A  -  a^.  E 
=:►  E  ”  A(e)R^^A(0)“  E  =e”  EA  -  a"  e“e 

-A  -a^.I 

A(0)R  A(0)”  =  E(  A  -  a^.I)  e”  Vl.l/) 

Thus  the  eigenvalues  of  A(0)  R  A(0)  are  the  d  largest  eigenvalues  of  R 

2  2 

augmented  by  a  .  Also  the  (m-d)  smallest  eigenvalues  are  all  equal  to  a  Now  if 

( X  e  j)  is  an  eigenpair  of  R  ,  then 

R^e.  =  X.e.  (1.18) 

and  for  any  i  >  d. 


( A(0)R33A(0)“  +  all)ei= 

=>  A(0)R33A(0)“  C;  =  0  (1.19) 

Now  from  the  fact  that  A(0)  and  Rjs  must  have  at  least  one  nonsingular 

submatrix  of  order  d  and  without  loss  of  generality,  suppose  that  this  submatrix 
consists  of  the  first  d  rows  of  A  j(0)  R^^.  Partition  A  |(0)Rss  as: 


A(e)R„=  (A,(9)R„,A2(e)R^)^ 


(1.20) 


The  substitution  of  (1.20)  into  (1.19)  yields 
A,(0)R33  A(0)”  ei  =  0 


(1.21) 


and 


(1.22) 


1  0 

A2(e)R,3  A(e)“  6^  =  0 

For  the  equation  (1.21)  to  be  satisfied, 

H 

AO)  6^=0  i>d  (1.23) 

Thus  6  1,6  2,  •••  6  d  span  the  same  subspace  spanned  by  the  column  vectors  of 
A(6).  In  most  situations,  the  covariance  matrices  are  not  known  exactly  but  need  to 
be  estimated.  Therefore,  one  can  expect  that  there  is  no  intersection  between  the 
array  manifold  and  the  signal  subspace.  However,  elements  of  the  array  manifold 
closest  to  the  signal  subspace  should  be  considered  as  potential  solution.  After 
determining  the  number  of  sources  [7],  Snaith  [5]  proposed  the  following  function  as 
one  possible  measure  of  closeness  of  an  element  of  the  array  manifold  to  the  signal 
subspace 


_ 1 _ 

a“(0)E„E”na(e) 


where  E  „  =  t  ^d^i  ^  J 


(1.24) 


The  dominant  d  peaks  over  9  e  [-  tc,  ti]  are  the  desired  estimates  of  the  directions  of 
arrival. 


For  the  particular  case  where  the  array  consists  of  m  sensors  uniformly 

spaced,  and  if  the  reference  point  is  taken  at  the  first  element  of  the  array,  Pm(0  )  is 

obtained  by  first  calculating  the  DFT  of  the  vectors  spanning  the  null  space  of 
AOIR^^AO)”  or 


^  n  “  t  '  ®d+2 


(1.25) 


1 1 

If  A  is  the  distance  separating  two  sensors  of  the  array,  an  element  of  the  array 
manifold  is  given  by 

a(0  )  =  (  1  ,  exp  ( j2  nA  sin0  /  A.), ... ,  exp  ( j2  ;r(m-l)A  sin0  /  k))  ^  (1.26) 

and  the  DFT  of  the  vector  e  j,  i  >  d  is  given  by 

m 

F  j=  a*  (0  )  e  j  =^6^1  exp(  -j2  7i(k-l)A  sin0  /  X)  (1.27) 

i=l 

thus 

—  a -28) 

1  Fi' 

k=l 

Summary  of  the  MUSIC  algorithm 

1)  Estimate  the  data  covariance  matrix  R. 

2)  Perform  the  eigendecomposition  of  R. 

3)  Estimate  the  number  of  sources. 

4)  Evaluate  P  ^(0  ) . 

5)  Find  the  d  largest  peaks  of  P  ^(0  )  to  obtain  estimates  of  the  parameters 

Although  MUSIC  is  a  high  resolution  algorithm,  it  has  several  drawbacks 
including  the  fact  that  complete  knowledge  of  the  array  manifold  is  required,  and 
that  is  computationaly  very  expensive  as  it  requires  a  lot  of  computations  to  find  the 
intersection  between  the  array  manifold  and  the  signal  subspace.  In  the  next  section, 
another  algorithm  known  as  ESPRIT  will  be  discussed.  Even  though  it  is  similar  to 


the  MUSIC  and  exploits  the  underlying  data  model  but  it  eliminates  the 
requirement  of  a  tim^-consuming  parameter  search. 

1.4;  ESTIMATION  OF  SIGNAL  PARAMETERS  VIA  ROTATIONAL 
INVARIANCE  (ESPRIT) 

Consider  a  planar  array  composed  of  m  pairs  of  pair  identical  sensors 
(doublets)  as  shown  in  the  Figure  1.4.  The  displacement  between  two  sensors  in 
each  doublet  is  constant,  but  the  sensor  characteristics  are  unknown  [5]. 


Figure  1.4;  Sensor  array  for  ESPRIT. 


The  signal  received  at  the  i 


th 


doublet  can  be  expressed  as 


(1.29) 


Cxx=Rxx- 

=  A(e)R33A*(e) 


(1.31) 


and 

R^y  =  A(e)R^^<I)*A*(0)  (1.32) 

In  the  computation  of  R  the  noise  in  different  sensors  is  assumed  to  be 
uncorrelated  (E[n  ^(t)  n  y(t)]  =  0). 

The  eigenvalues  of  the  matrix  pencil  (C  R  are  obtained  by  solving 

C„.  rR.,  =0  (1.33a) 


or 


(1.33b) 


1  4 


A(e)R„(I-Y<I)W(0) 


=  0 


Now  from  the  fact  that  A(6)  and  R  are  full  rank  matrices.  Equation  (1.33b)  reduces 


to 


I-  yd)* 


=  0 


(1.34) 


and  the  desired  singular  values  are 

jco^A  sine^ 

Y,^=exp( - - - )  k=l,...,d  (1.35) 


Thus  the  direction  of  arrival  can  be  obtained  without  involving  a  search 

technique  as  in  the  MUSIC  case,  and  in  that  respect  computation  and  storage  costs 

are  reduced  considerably.  Also  it  can  be  concluded  can  conclude  that  the  generalized 
eigenvalue  matrix  associated  with  the  matrix  pencil  (C  R  ^y)  is  given  by: 


O  0 
0  0 


(1.36) 


However,  due  to  error  in  estimating  R  xx  and  R  xy  from  a  firxite  data  sample  as 
well  as  round-off  errors  introduced  during  the  squaring  of  the  data,  the  relation 
between  A  and  O  given  above  is  not  exactly  satisfied,  which  make  this  method 
suboptimal. 

The  following  procedure  is  proposed  to  estimate  the  generalized  eigenvalues  [7] 
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1)  Find  the  data  covariance  matrix  of  the  complete  2m  sensors,  denoted  by 

2)  Estimate  the  number  of  sources  d. 

3)  Estimate  the  noise  variance  (average  of  the  2m  -  d  noise  eigenvalues). 

4)  Compute  R^z-  I,  A(0)  R  ss  A*(0)  and  A(0)  R  (I)*A*(0)  are  then  the  top  left 
and  top  right  blocks. 

5)  Calculate  the  generalized  eigenvalues  of  the  matrix  pencil  (Cxx,  Rxy)  and 
choose  the  d  ones  that  lie  close  to  the  unit  circle. 

1.5:  TOTAL  LEAST  SQUARE  (TLS)  ESPRIT 

The  last  method  is  based  on  having  a  very  good  estimate  of  the  noise 
variance,  a  condition  difficult  to  satisfy  in  most  real  cases.  This  may  yield  overall 
inferior  results.  To  circumvent  this  difficulty  to  some  extent,  the  total-least-square 
(TLS  ESPRIT)  scheme  is  used  instead. 

Let 

=  A  s(t)  +  n^(t)  (1.37) 


where 

T 

and  let  E  j,=  [  e^,  62,...  ]  be  the  (2m  x  d)  matrix  composed  of  the  eigenvectors 

corresponding  to  the  d  largest  eigenvalues  of  (R  I).  Since  the  columns  of  E  ^  and  A 

span  the  same  subspace,  then  there  must  exist  a  non-signular  (d  x  d)  T  matrix  such 
that 


A 

AO 


n,(t) 


nx(t) 

ny(t) 


(1.38) 


z(t)= 


x(t) 

y(t) 


E^=  A  T 


(1.39) 
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Now  define  two  m  x  d  matrices  E  ^  and  E  y  by  partitioning  E  ^  as 


Es  = 


Ex 

AT 

E 

AOT 

y 

(1.40) 


Since  E  ^  and  E  ^  share  a  common  space  (i  e.  the  columns  of  both  E  ^  and  E  ^  are  a 
linear  combination  of  the  columns  of  A),  then  the  rank  of  E  =  [E  ^  I  E  ^,]  is  d  which 

implies  that  there  exist  a  unique  2d  x  d  matrix  F  of  rank  d  such  that 

0=[EJ  Ey]  F  =  E,F^  -HEyFy 

=  ATF^  +  A<hTFy  (1.41) 

(F  span  the  null-space  of  [E  ^  I  Ey] ). 

In  the  above  equation  [E  x  •  E  y]  is  an  m  x  2d  matrix,  it  can  be  seen  as 

consisting  of  m  vectors  in  a  2d  dimensional  space,  and  the  set  of  all  vectors  which 

transform  into  the  zero  vector  (i.e.  which  satisfy  [Ex  I  E y]  x  =  0)  is  called  the  null 

space  of  A,  and  it  has  a  dimension  2d-rank  [Ex  I  E y]  or  d.  Now  if 

'F  =  -FJFy'']  (1.42) 

then 

AT'fT^=AcI)  (1.43) 

If  A  is  assumed  to  be  a  full  rank  matrix: 

T4^r^=<h  (1.44) 

Thus  the  eigenvalues  of  NK  correspond  to  the  diagonal  element  of  <1> 
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Summary  of  the  TLS  ESPRIT 

1)  Obtain  an  estimate  of  the  data  covariance  matrix  R  ,  denoted  by  R 

2)  Perform  the  eigendecomposition  of  R  as  R  =  E  A  E  where 

A=  diag(  X  >  2-- 

and 

E  =  (e,,e2,.... 


3)  Estimate  the  number  of  sources  d. 

4)  Obtain  E [  e, ,  62/...  ]  and  decompose  it  to 


obtain 


Ev,' 


5)  Compute  the  eigendecomposition  of  E  ^^*E  = 


[E^  I  EJ=EAE 


and  partition  E  into  four  d  x  d  submatrices 


E  11  E 12  i 
E=i  I 

j  E  21  E  27 

6)  Calculate  the  eigenvalues  of  T  =  -  E  ,2[  E22  '] 

7)  Estimate  B 

-1 

0  ^  =  Sin  {cargfO  ^)I(cOqA)} 


1.6;  IMPROVED  TLS  ESPRIT 


By  considering  the  eigsndecomposition  of  the  data  matrix  R  of  rank  d, 
following  equation  can  be  written. 


,  =  i  =  a  i=d+l,...,  2m 

Using  the  same  procedure  as  in  the  MUSIC  algorithm 


A  G  =  0 


where 


^  f  ®  d  +  l  '  ®  d+2' .  ® 1 


-Vow  from  the  fact  that  A  and  G  can  be  partitioned  as 


and  G= 


Hence 


, .  H  .  H  .  H.  „ 

(A  ,<1>  A  )  =0 

G. 


A^G^  +  cI)'^a“g^=0 


.  H_  ^  H  . 

A  G^  =  -  <t)  A  Gy 


G^A  =  -  G%A 


(1.45) 


(1.46) 


(1.47) 


(1.48) 


(1.49) 


By  multiplying  both  sides  of  the  above  equation  by  T  defined  in  Equation(1.39). 


g”  AT  =  A<t)T  (1.49a) 

X  V 

or 

=  a.49b) 

Because  E  ^  and  E  ^  span  the  same  subspace,  then  the  objective  in  the  previous  TLS 

algorithm  is  to  find  a  matrix  y  €  such  that 

E^V  =  Ey.  (1.50) 

The  substitution  of  (1.50)  into  (1.49b)  yields 

GxE,=  -G"e,V  (1-51) 

Thus  if  there  exist  which  transforms  E  ^  into  E  ^  ,  this  transformation  must  also 

H  H  H  H 

transform  -  GyE^into  G^E^.  (  Note  that  -  GyE^  and  G  ^E  ^  span  the  same  subspace 

as  spanned  by  the  columns  of  E  ^  or  E  y). 

In  practical  situations,  where  only  a  finite  number  of  noisy  measurements  are 
available.  Equations  (1.50)  and  (1.51)  can  not  be  satisfied  exactly.  A  criterion  for 
obtaining  a  suitable  estimate  of  \|/  must  be  formulated.  The  TLS  is  a  method  of 
fitting  that  is  appropriate  in  this  case  because  Ex  ,  E y,  G^  y  E  x  ,  and  G^  x  E  x  are  all 
noisy  measurements  . 

To  find  a  common  transformation  which  satisfies  both  (1.50)  and  (1.51)  ,  define 
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GyEx 


and 


H.= 


Gx^x 


(1.52) 


thus  \j/  is  given  by 

\|/=  H2 


(1.53) 


The  previous  TLS  algorithm  applied  to  the  model  E  x  V  =  E  y  can  be  viewed  as 
using  m  observations  (  the  number  of  rows  of  Ex  or  E  y).  By  using  Equation  (1.53),  it 
is  easily  verified  that  the  number  of  observations  is  increased  from  m  to  3m-  d  . 
Thus  a  better  estimate  of  \|/  is  believed  to  be  achieved,  and  the  algorithm  of  the 
improved  TLS  will  be  the  same  as  for  the  TLS  with  the  exception  of  replacing  E  xy  by 

EHjHj. 

However  the  same  solution  for  'F  can  be  achieved  by  considering  instead  the 
d  matrices 


HfHjH'Jha 

h“H,  h"h2| 


(1.54) 


where  h  2; 


is  the  i 


•  th 


column  of  the  matrix  H  2-  If  ^  d+i 


is  the  smallest  eigenvalues  of 


^  i  ®d  +  l  "  ^d+1  ®d+1 


Kj,  then 


(1.55) 


Now  by  transforming  e  into  ,  X  j  solves  the  TLS  problem  and  gives  the  i*^ 
column  of  4^  [8]. 

This  transformation  is  very  useful  for  parallel  processing  as  it  avoids  the 
computation  of  F  y  to  find  y.  However  this  method  has  a  disadvantage  as  the 

eigendecomposition  of  d  matrices  must  be  performed  at  the  same  time. 


Chapter  II 


PARALLELIZING  OF  MUSIC/ESPRIT  ALGORITHMS 

2.1;  INTRODUCTION 

First  step  in  the  development  of  parallel  architecture  for  any  algorithm  is  to 
transform  it  into  a  computationally  efficient  algorithm.  This  is  achieved  by 
carefully  studying  the  algorithm  and  substituting  parts  of  the  algorithm  with  easily 
computable  ones.  For  example  MUSIC  and  ESPRIT  involve  the  eigendecomposition 
which  can  be  written  in  FORTRAN  language  or  can  be  computed  by  calling  scientific 
subroutines  from  commercially  available  packages.  In  this  chapter  efforts  are 
directed  to  obtain  more  efficient  procedures  for  computing  MUSIC  and  ESPRIT 
algorithms  and  have  been  explained  in  the  following. 

MUSIC  and  ESPRIT  involve  intensive  matrix  eigendecomposition  which 
involve  generalized  eigenvalue  analysis,  that  is  equivalent  to  the  determination  of 
the  roots  of  characteristic  polynomial  as  shown  below: 

pa)  =  det(R^-XI) 

=  (-l)"(\"  +  C„.jX"‘V . C^X+Co)  (2.1) 

The  idea  of  explicitly  computing  the  coefficients  of  a  polynomial,  at  first  glance  may 

appear  attractive  in  view  of  the  fact  that  operations  on  a  polynomial  are  easy  to 

perform.  However  as  n  increases,  the  round  off  errors  in  computing  the  coefficients 

may  lead  to  very  large  errors  in  the  resulting  roots.  Generally  all  the  methods 
used  in  solving  (2.1)  are  iterative.  In  fact  the  data  covariance  matrix  R  ^ 


can  be  transformed  into  a  diagonal  matrix  by  using  the  iterative  scheme  described 
below. 

ifRxx'R  I  is  an  m  x'm  matrix  and  if  Q  j  is  an  orthogonal  matrix,  it  can  be 

shown  that  the  eigenvalues  of  R  jare  invariant  under  a  congruent  transformation 
with  Q 

X  =  XX 

(Q  “RiQi)(Q“x)  =  X(q”x)  (2.2) 

H  H 

Thus  if  (X,  X)  is  eigenpair  of  R^  then  (  Q  j  X,  X. )  is  eigenpair  of  R^  Q  i  =  R2,  where 

Q  denotes  complex  conjugate  transpose  of  matrix  Q.  The  same  procedure  is 
repeated  for  R  2  to  obtain  R  3  and  so  on.  The  matrices  Q  j's  are  determined  in 

such  a  way  that  after  a  certain  number  of  iterations  the  matrix  R  ^  converges  to  a 

diagonal  one. 

The  evaluation  of  eigenvalues  and  eigenvectors  are  very  important  in  the 
process  of  parallelizing  the  algorithm.  Accuracy  in  the  computations  of  eigenvalues 
and  eigenvectors  will  also  determine  the  accuracy  of  the  angle  of  arrival.  The  DOA 
computations  need  to  be  performed  in  real  time,  hence  fast  evaluation  of 
eigenvalues  and  eigenvectors  are  important.  The  averaging  technique  used  at 
various  sensors  produces  a  Hermetian  covariance  matrix  from  which  the 
eigenvalues  and  eigenvectors  can  be  obtained. 

Structured  implementation  suitable  to  the  high  resolution  DOA  estimation  is 
described  for  both  MUSIC  and  ESPRIT  algorithms.  Figure  2.1  gives  the  flow  chart  of 
a  pipelined  arrangement  of  MUSIC  algorithm.  The  number  of  sensors  depend  on 
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Figure  2.1:  Flowchart  for  MUSIC  algorithm 
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the  number  of  sources  need  to  be  located  at  a  given  time.  The  number  of  sensors 
should  always  be  greater  than  the  number  of  sources  [4]  [5].  Assuming  that  the 
number  of  signals  never  exceeds  seven,  a  sensor  array  of  eight  is  considered  in  the 
rest  of  this  report. 

As  shown  in  Figure  2.1,  the  data  is  collected  from  the  sensors.  It  consists  of 
noisy  data  vectors  X(i)'s  whose  components  are  x  j  ,  where  j  and  i  mean  sensor 
and  i^^'  sample  respectively.  Also  a  data  point  at  a  particular  sensor  consists  of  in- 
phase  and  quadratic-phase  components.  The  data  covariance  matrix  Rxxis 
computed  using  the  sampled  data. 

The  eigendecomposition  is  performed  on  R^x  and  m  eigenvalues  are  derived 
from  which  the  number  of  sources  is  estimated  by  testing  the  null  hypothesis  that 
the  smallest  eigenvalue  has  multiplicity  m-d,  namely  Hd  =  ^  d+i  =^d+2  =  •  •  •  •=  m  is 
tested.  The  likelihood  ratio  for  the  problem,  under  Guassian  assumption,  is  given 
by: 


whereX,>X2>---  > 

The  hypothesis  Hd  (  d  =  0,  1, . m-1)  are  tested  sequentially,  and  the  smallest 

likelihood  ratio  is  chosen.  The  final  phase  of  MUSIC  is  the  determination  of  the 
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angle  of  arrival  which  can  be  approximated  from  the  elements  of  array  manifold 
closest  to  the  signal  subspace  by  using  the  measure  function  Pm(0)  described  in 
chapter  I.  The  plotting  of  Pm(9)  is  computationaly  very  expensive  as  it  requires  the 
calculation  of  Pm(0)  for  every  angle  0  €  (-  jr,  k)  to  find  the  intersection  between  the 
array  manifold  and  the  signal  subspace  given  by  the  d  peaks  of  Pm(0). 

Similar  to  pipelined  MUSIC  flowchart,  a  flowchart  for  TLS  -  ESPRIT 
algorithm  is  shown  in  Figure  2.2.  The  different  steps  involved  in  estimating  the 
DOA  using  ESPRIT  are  given.  ESPRIT  is  similar  to  MUSIC  both  of  which  exploit  the 
underlying  data  model,  but  it  eliminates  the  computation  of  Pm(0)  for  every 
possible  0.  However  it  involves  some  more  eigendecomposition,  i.e.,  the 
eigendecomposition  of  Exy  and  -E  12  22  ^rid  the  inversion  of  matrix  E22  as 

shown  in  Figure  2.2.  Also  one  should  note  that  the  order  of  data  covariance  matrix 
has  increased  from  m  to  2m.  In  the  following  sections,  algorithms  for  various 
computational  modules  required  in  MUSIC  and  ESPRIT  are  developed. 

2.2:  DATA  COVARIANCE  MATRIX  FORMATION 

Once  the  data  has  been  collected  by  the  sensors  the  data  covariance  matrix  can 
be  computed  using 


Figure  2.2:  Flowchart  for  pipelined  a rangement  for  ESPRIT 
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y^x(p).x(p)* 


(2.3) 


Where  R  =  Covariance  of  data  matrix 

tR 

x(p)  =  Vector  of  data  output  from  every  sensor  at  p  sample 
given  by  (x  ^ ,  X  2  , . x  g)^ 

n  =  Number  of  samples 


The  sampled  data  obtained  from  the  sensors  is  used  to  obtain  the  data 
covariance  matrix  given  by  equation  (2.3).  For  instance,  the  element  (i,j)  of  Rxx 
denoted  by  R  jj  is  computed  as: 

n 


^Xj  (p)  .  Xj(p) 


Since  the  covariance  matrix  is  Hermetian,  the  computation  of  lower  triangular 
matrix  of  covariance  matrix  is  sufficient  to  get  complete  information  of  the  full 
matrix. 


The  sequence  of  transformations  described  earlier  as  in  equation  (2.2)  to 
reduce  the  data  covariance  matrix  to  a  diagonal  one  is  one  of  the  most  efficient 
algorithms  for  determining  all  the  eigenvalues  and  eigenvectors.  It  is  known  as  QR 
algorithm.  However  when  applied  to  a  dense  matrix,  it  is  very  time  consuming, 
requiring  a  large  number  of  computations.  In  order  to  circumvent  this  drawback, 
the  data  covariance  matrix  is  transformed  first  to  tridiagonal  one  using 
Householders  transformation.  Chen  [9]  and  Dongarra  [10]  discussed  the  reduction 
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of  a  Hermetian  matrix  to  a  tridiagonal  matrix  using  Householders  transformation. 
The  eigenvalues  of  R  xx  ^re  invariant  under  this  transformation  as  the  Householder 
transformation  is  orthogonal.  Hence,  the  Householders  transformation  to  reduce 
the  dense  matrix  to  a  tridiagonal  matrix,  QR  method  to  reduce  tridiagonal  matrix  to 
a  diagonal  one,  the  power  method  to  compute  the  angle  of  arrival  are  discussed 
here. 

2.3;  HOUSEHOLDERS  TRANSFORMATION 

First  the  mathematical  aspect  of  Householder  transformation  is  presented, 

then  its  parallelization  and  its  hardware  implementation  is  explored.  The  method 

of  Householder  [11],  can  effectively  reduce  the  bandwidth  of  data  covariance  matrix 

R  XX  by  transforming  it  to  a  tridiagonal  matrix  T.  In  order  to  transform  the  m  x  m 

data  matrix  into  a  tridiagonal  one,  m-2  Householder’s  transformations  (N)  are 

determined  such  that 
R^^N  =T 

where  N  =  N  ^_2  N  . ^ 


I 

0 

N  = 

0 

N. 

k 

k 


m  -  k 


k  m  -  k 


and 
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N,=  I 


2  w  w 

H 

w  w 


with  N  V  e  and  w  e 


The  matrices  N  k  are  determined  to  eliminate  the  elements  above  and  below 
the  subdiagonals  without  disturbing  any  previously  zeroed  rows  and  columns.  The 
basic  iterative  sequence  of  operations  for  this  transformation  method  can  be  stated 


R  =Rxx 
begin 

For  k=l,2,...,m-2, 


Rk+i  =  N^R^Nk  suchthat  N“Nk  =  I 


T  =  R 


For  k=l,  the  transformation  can  be  written  as 


R2  =  Nj  R^  N  1 


where 


r,  R 


1  m-1 


Therefore 


3  1 


should  have  the  following  form 
w  =  r,+  pcj 

where  e  j  =  (1,  0,  0,  0...)^ 

and  ri  =  (r2i,r3i, . r„,)^ 

P  is  a  complex  number  that  is  to  be  determined. 


f  2ww^ 


|p  ejr,  =  (3rl^e, 

multiplyine;  both  sides  of  (2.5)  by  (3’^. 

(  p*)"e|  r,  =  P  p  e, 
Substitution  of  (2.6)  in  (2.4)  yields 


(2.4) 

(2.5) 


(2.6) 


Using  the  fact  that 


3  4 

♦ 

and  because  of  the  Hermetian  property  the  first  row  becomes  (r  -(3  ,  0,  0,...0). 

In  the  following  and  for  simplicity,  a  method  is  given  to  calculate  the 
elements  of  R  2  for  the  real  case,  i.e.,  the  data  is  composed  of  in-phase  components 
only.  The  same  method  may  be  applied  for  the  complex  case  where  the  data  is 
composed  of  in-phase  and  quadrature-phase  components.  Hence  for  real  variables 
the  Householders  transformation  reduces  to: 

R,  =  N,R,N , 


T  T,  T 

ww  R^ww 

=  R  ]  *  j  R  j  -  R  j  +  4  j  j  (2.8) 

ww  ww  wwww 

Let 

w  =  r,+(3e, 

T 

w  w 


and. 


2ww 


T 

w  w 


r 


R, 


I- 


2ww 

T 

w  w 


2ww 


R^  -  J 

w  w 


R, 


I- 


2ww 

“t 

w  w 


2  ww 


2  ww 


d  =  R  ,w 


m 


Equation  (2.8)  can  be  rewritten  as  follows: 


w  d  d  w 


T  w  (w^  d) 


R,  =  R  ,  -  - 

1  1  c 


T  _  T 


T  T 


wd^  dw^  w(w  d)w  w(w  d)w 


(w^  d)^ 


U  J 

=  Rl  -  7'  T—  W  -  w  — - - 2 

^  2 .  c“  2 .  c 


jT  (w^d)w^ 


T  T 

j‘  d  w w 


T 

V  = 


then 


jjT  (w^  d) 


T 

V  = 


and  from  the  fact  that 


d  w  =  w  d 


J  w  (w  d) 


Also  let 
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T  , 
w  d 


Therefore 


d 

V  =  --  wp 


and  equation  (2.9)  becomes 


T  T 

R2=R|-wv  -  vw 


(2.10) 


The  choice  of  above  equation  is  primarily  motivated  by  the  interest  in  the 
application  of  parallel  processing  in  computing  the  elements  of  the  matrix  R  2,  that 
is,  all  the  columns  of  R  2  can  be  computed  in  parallel  as: 

Rij  =Rij-VjW  -  WjV 

j=T2 . 8 

where  R  and  R  are  the  column  of  R  2  and  R  j  ,  and  v  ^  and  w  j  are  the 


components  of  v  and  w  respectively. 


The  flowchart  for  the  Householders  transformation  showing  its 
parallelization  and  sequence  of  operations  is  as  given  in  Figure  2.3.  The  values  of 
w’  and  'c'  using  the  first  column  of  R  xx-  These  values  of  'w'  and  'c'  are  used  to 
compute  all  the  d  s  in  parallel.  Once  the  d  s  are  evaluated  v  s  and  'p'  are  evaluated. 
Using  all  the  values  evaluated  as  above  new  values  of  the  elements  of  the  column 
are  computed.  This  is  the  first  iteration.  After 


Figure  2.3:  Flowchart  for  parallel  Householders  transformation 
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every  iteration  the  counter  is  incremented.  For  m  -  2  iterations  the  new  values  of 
the  columns  which  are  computed  are  used  in  feedback  loop  to  perform  same 


operations. 


2.4:  QR  METHOD 


Given  the  tridiagonal  matrix  T  and  defining  U  =  NH  which  is  obtained  using 
Householders  transformation,  QR  algorithm  may  be  used  to  compute  eigenvalues 
and  eigenvectors.  This  is  achieved  by  producing  a  sequence  of  transformations 
based  on  orthogonal  matrices  and  illustrated  by  the  following  algorithm. 


u^  =  n“ 

begin 
for  k=l,  n 

■'k  “Qk^k 

T,  ,  =  R,Qu 

k+1  k^k 

^ 1  =  q”  u  ^ 

k+1  k 

end 


After  n  iterations  T  will  be  approximately  a  diagonal  matrix  whose  diagonal 
elements  are  the  eigenvalues  of  the  original  matrix.  The  matrix  U  will  have  rows 
which  are  eigenvectors  of  the  original  matrix. 
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An  example  is  given  to  calculate  new  values  of  T  for  real  case.  Suppose  T  is 
obtained  at  the  iteration  and  we  want  to  calculate  T  the  procedure  :s  ""s 
follows; 

"QLqIz . Q," 

where 

1  0  0 
0  1  0 
0  0  1 

cos  sin 
-sin  cos 

1  0 
0  1 


o 


Let  the  entries  of  the  diagonal  elements  of  tridiagonal  matrix  be  a(m,k)  and  the 
entries  of  subdiagonal  elements  be  b(m,k)  where  m  is  row  or  column  number  and  k 
is  iteration  number.  The  c(m,k)  and  s(m,k)  are  sine  and  cosine  that  gives  angle  of 

th  th 

rotation  of  m  and  (m+1)  rows  and  u(m,k)  will  be  the  eigenvectors  at  iteration  k. 


The  tridiagonal  matrix  T  is  given  by 
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a(l,k)  b(2,k) 
b(2,k)  a(2,k)  b(3,k) 

b(3,k)  a(3,k) 


th  . 


From  the  k  iteration  the  updated  T  using  the  following  equation  can  be 
computed. 

Tk.i  =  qLq„Vq7t,q,.q„.2q„ 

However  the  updated  entries  b(i,  k+1)  and  a(i,  k+1)  depend  only  on  the  matrices  Q  . 


and  Q  i  j  •  They  are  calculated  using 
0^0^  TOO 


(2.11) 


The  possibility  of  parallelizing  the  QR  algorithm  is  explored  base  on  [15]. 
Detailed  elaboration  is  given  below.  To  derive  the  equations  for  parallel  algorithm 
the  following  parameters  are  defined: 


cos(i,  k)  =  cl,  cos(i-l,  k)  =  cO,  sin(i,  k)=  si  sin(i-l,  k)  =  sO,  a(i,  k)  =  al  a(i-l,  k)  = 
aO,  a(i+l,  k)  =  a2,  b(i,  k)  =  bl,  b(i+l,  k)  =  b2,  a(i,  k+1)  =  a3 
b(i,  k+1)  =  b3 


4  1 

Substituting  the  above  parameters  in  equation  (2.11) 

X  XX 

b3  a3  X 

.XXX 

■  1  0  0 1  r  cO  sO  0l  (■  aO  bl  0  1  r  CO  -sO  0l  r  1  0  O' 

=  0  cl  si  -sO  cO  0  bl  al  b2  sO  cO  0  0  cl  -si 

.O-slclJL  0  0  iJL  0  b2a2jL  0  0  iJLosl  cl. 

■  1  0  0  1  r  cO  sO  Ol  r  aO  bl  0 1  r  cO  -clsO  -sOsl' 

=  0  cl  si  -sO  cO  0  bl  al  b2  sO  cOcl  -cOsl 

.O-slclJLo  OljLob2a2jLo  si  cl  . 

10  0  cOaO+sObl  cObl+sOal  s0b2  cO-clsO-sOsl 
=  0  cl  si  0  -sObl+cOal  c0b2  sO  cOcl  -cOsl 

.0-s1ciJl  0  b2  a2jL0  si  cl  . 


By  solving  the  above  matrix  the  value  of  b(l,  k+1)  can  be  evaluated  as 

TcO' 

b(i,  k+1)  =  (  0  ,  cl(-sObl  +  cOal)  +  slb2,  c0clb2  +a2sl)  sO 

.0. 

Generalizing  for  the  i"^  element: 

b(i,  k+1)  =  s(i-l,  k){c(i,  k)[c(i-l,  k)a(i,  k)-s(i-l,  k)b(i,  k)]  +  s(i,  k) 

b(i+l,  k)}  (2.12) 

Let  w  =  c(i,  k)[-s(i-l,  k)  b(i,  k)  +  c(i-l,  k)  a(i,  k)]  +  s(i,  k)  b(i+l,  k) 

Substituting  w  in  (2.12) 

b(i,  k+1)  =  s(i-l,  k) .  w  (2.13) 

Similarly  for  a(i,  k+1) 

-clcO 

a(i,k+l)  =  (0,  cl(-sObl +c0al)  +  slb2,  c0clb2+a2sl)  cOcl 

.  si  . 


(2.14) 


a(i,  k+1)  =  c(i-l,  k)c(i,  k){c(i,  k)[c(i-l,  k)a(i,  k)-s(i-l,  k)b(i,  k)]+s(i,  k) 
b(i+l,  k)}  +s(i,  k)[c(i-l,  k)c(i,  k)b(i+l,  k)+s(i,  k)a(i+l,  k) 

Let  V  ^  c{i-L  k)  c(i,  k)  b(i+l,  k)  +  s(i,  k)  a(i+l,  k) 
substituting  w  and  v  in  equation  (2.14) 

a(i,  k+1)  =  c(i-L  k)  c(i,  k)  .  w  +  s(i,  k) .  v  (2.15) 

Similarly  values  of  sin(i,  k)  and  cos(i,  k)  can  be  calculated  using  the  following 
general  relation. 

cos(i,  k)  =  x(i+l,  k)/r  (2.16) 

sin(i,  k)  =  b(i+l,  k)/r  (2.17) 

where  x(i+l,  k)  is  the  updated  a(i,  k)  after  (i-l)**'  rotation  and  is  given  by 

x(i+l,  k)  =  -sin(i-l,  k)b(i,  k)  +  cos(i-l,  k)a(i,  k) 

and 

r  =  sqrt{[b(i,  k)]^  +  x[(i+l,  k)]^} 

Using  the  above  values,  a  pseudocode  for  the  QR  algorithm  can  be  written.  In 
this  algorithm  first  parameters  are  initialized  and  sine  and  cosine  of  the  angle  of 
rotation  using  elements  of  the  matrix  as  in  the  equations  (2.16)  and  (2.17)  are 
computed.  New  values  of  diagonal  elements  a^  and  subdiagonal  elements  b^  are 
then  computed  using  equations  (2.13)  and  (2.15).  This  process  of  computing  a^  and 
b  g  is  repeated  as  long  as  all  the  b^  become  0  or  negligible  compared  to  diagonal 
elements  a^.  The  pseudocode  which  performs  the  QR  method  is  as  follows; 


x(i,  k)=0;  b(l,  k)=o;  a(0,  k)=0;  b(n+l,  k)=0;  c(0,  k)=l;  s(0,  k)  =  0; 
c(n+l,  k)-l;  s(n+l,  k)=0;  xl=0;  n=0; 
k  =0 
repeat 

for  i=l,m 

x(i+l,  k)=  -  s(i-],  k) .  b(i,  k)  +  c(i-l,  k) .  a(i,  k) 
r=sqrt{  [b(i,  k)]V[x(i+l,  k)]^ } 

if  r  >  0 

c(i,  k)  =  x(i+l,  k)/r 
s((i,  k)  =  b(i+l,  k)/r 

else 

c(i,  k)  =  1 
s(i,  k)  =  0 

end  if 

w=c(i,  k) .  x(i+l,  k)+s(i,  k) .  b(i-t-l,  k) 

v=c(i-l,  k) .  c(i,  k) .  b(i+l,  k)  +  s(i,  k) .  ati+1,  k) 

b(i,  k+l)=s(i-l,  k) .  w 

a(i,  k+l)=c(i-l,  k) .  c(i,  k) .  w  +  s(i,  k) .  v 

for  j=l,m 

u(i,  j,  k+1)  =  c(i,  k)  .  u(i,  ],  k)+s(i,  k)  .  u(i+l,  j,  k) 
u(i+l,  j,  k-fl)  =  -s(i,  k)  .  u(i,  j,  k)+c(i,  k)  .  u(i+l,  j,  k) 
end  for 
end  for 
k  =  k+l 

until  sum  of  squares  of  b  =  0 


2.5:  POWER  METHOD 


Estimation  of  the  DOA's  is  performed  by  using  the  eigenvectors 
corresponding  to  the  m  -  d  smallest  eigenvalues.  It  was  seen  earlier  that  the 
direction  vectors  corresponding  to  the  d  sources  are  orthogonal  to  the  true 

eigenvectors  e  d+i/  ed+2  / . Sm  ■  However,  in  a  practical  situation,  only  estimates  of 

these  eigenvectors  are  available  and  thus  it  is  expected  that  the  orthogonality 
criterion  does  not  hold  exactly.  Instead  the  projection  of  these  direction  vectors  on 
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the  subspace  spanned  by  the  estimated  eigenvectors  are  used  in  potential  solution, 

that  is  the  cosine  between  each  of  the  direction  vectors  and  the  estimated 
eigenvectors  e  e  ^^^2 ' . ®  m  expected  to  be  close  to  zero. 

a*(0i)E„-O  i  =  l,2,...d 
“  ®d+l'®d+2' . ®  m 

The  following  function  is  chosen  as  one  possible  measure  of  closeness  of  an 
element  of  the  array  manifold  to  the  signal  subspace. 

- -  (2.18) 

a*(0)  E  „E*  „  a(0) 

and  the  dominant  d  peaks  over  0  e  (-tc,  tc)  are  the  desired  estimate  of  the  directions 
of  arrival. 

2.6:  COMPUTATIONAL  MODULES  FOR  ESPRIT 

A  detailed  flowchart  for  the  computation  of  the  ESPRIT  is  shown  in  Figure 
2.4.  As  shown  in  the  flowchart  for  pipelined  arrangement  for  ESPRIT  most  of  the 
operations  are  similar  to  MUSIC  algorithm.  The  design  of  additional  required 
computation  modules  for  ESPRIT  such  as  multiplication  of  matrices  and  inversion 
of  matrices  are  in  progress. 


Eigendecomposition  of  using  QR  Method 
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Chapter  III 


HARDWARE  IMPLEMENTATION 


3.1:  INTRODUCTION 

With  the  advances  in  the  area  of  VLSI  it  is  now  possible  to  design  special 
purpose  hardware  for  the  implementation  of  various  real  time  algorithms.  The 
customized  hardware  has  two  main  advantages  as  listed  below. 

1)  The  given  algorithm  is  executed  at  a  high  speed. 

2)  Cost  and  size  of  the  hardware  will  be  lower  than  the  cost  of  a  general  purpose 
computer. 

These  advantages  have  led  many  researchers  to  probe  into  the  possibility  of 
designing  special  purpose  hardware.  The  development  of  special  purpose  hardware 
will  need  to  exploit  pipeline,  parallel  and  distribu*-ed  processing  approaches  to 
achieve  high  throughput  rates.  Hence  in  this  section  we  present  the  first  step  in  the 
development  of  a  dedicated  chip  set  to  support  MUSIC  and  ESPRIT  algorithm 
which  has  been  explained  in  previous  sections. 

There  are  many  architectures  such  as  systolic  array,  SIMD  Cordic  Processors  and 
MIMD  v.'hich  can  be  used  for  parallel  implementation.  An  appropriate  structure 
which  can  exploit  maximum  paralielization  to  reduce  the  computation  time 
will  be  selected  for  real  time  implementation  for  the  particular  application. 


4  7 
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3.2:  LITERATURE  SEARCH 

Various  papers  pertaining  to  parallelization  of  Householders  and  QR 
algorithms  were  reviewed.  C.F.T.  Tang  et  al  [12]  and  K.J.R.  Liu  [13]  proposed 
architecture  for  complex  Householder  transformations  for  triangularization  of  the 
matrix.  In  their  architecture  they  used  single  column  with  the  number  of  processors 
equal  to  the  number  of  columns  of  the  matrix.  Each  processor  performs  operation 
on  each  column.  After  each  iteration  the  values  of  each  column  are  fed  back  to  the 
same  processors.  But  their  architecture  is  proposed  to  perform  triangularization  of 
the  given  matrix  whereas  we  are  interested  in  tridiagonalization  of  the  covariance 
matrix. 

QR  method  for  the  tridiagonal  matrix  is  implemented  by  W.  Phillips  [15].  In 
his  architecture,  rectangular  systolic  array  is  used  in  which  each  processor  performs 
single  iteration.  When  the  first  iteration  is  performed  on  the  m  row  by  the  k 
processor,  the  second  iteration  is  performed  on  the  (m-1)  row  by  the  (k-1) 
processor  and  so  on.  But  the  disadvantage  in  this  approach  is  that  the  number  of 
processors  is  dependent  on  the  the  number  of  iterations,  i.e.,  if  5  iterations  are 
required  then  5  processors  are  required.  But  the  exact  number  of  iterations  is  not 
known  which  leads  to  the  uncertainity  of  the  required  number  of  processors. 

K.J.R. Liu  [16]  has  proposed  another  kind  of  approach  in  which  a  systolic  array 
arranged  in  a  matrix  form  is  used.  The  number  of  processors  is  equal  to  the  number 
of  elements  of  the  matrix.  During  first  step,  the  matrix  Q  is  found.  Then  new  values 
of  matrix  A  are  then  calculated  using  Q.  Convergence  for  all  the  elements  of  the 
matrix  other  than  the  diagonal  elements  is  checked.  If  all  the  elements  of  the  matrix 
other  than  the  diagonal  elements  are  not  equal  to  zero  then  the  same  systolic  array 


is  used  for  the  next  iteration.  These  iterative  computations  are  used  until  all  the 
elements  of  the  matrix  except  the  diagonal  elements  converge  to  zero.  The  obvious 
advantage  is  that  the  same  set  of  processors  can  be  used  for  all  the  iterations.  But  the 
drawback  is  that  this  architecture  is  proposed  for  the  evaluation  of  eigenvalues  on 
the  dense  matrix.  In  the  next  sub-section  systolic  architecture  for  the  previously 
developed  parallel  algorithms  for  the  computations  of  covariance  matrices, 
householders  transformation  and  QR  method  is  presented. 


3.3:  SYSTOLIC  ARCHITECTURE  FOR  FORMATION  OF  DATA  COVARIANCE 
MATRIX. 


The  parallel  computation  of  the  data  covariance  matrix  is  performed  using 
systolic  architecture.  As  stated  earlier  the  covariance  matrix  is  Hermetian  and 
computation  of  lower  triangular  elements  of  the  covariance  matrix  is  sufficient  to 
get  the  information  for  the  entire  matrix.  Since  there  are  36  elements  in  the  lower 
triangular  8x8  matrix,  systolic  architecture  will  have  36  processors.  Here  a 
triangular  arrangement  of  the  systolic  array  with  global  routing  is  considered  as 
shown  in  the  Figure  3.1.  Each  processor  is  numbered  as  P(mn)  where  m  is  the  row 
number  and  n  is  the  column  number.  The  sampled  data  from  the  i*^'  sensor  is  sent 
to  the  i^^  row  and  the  i'^’  column  simultaneously.  For  example  the  sampled  data 
from  the  3'"’^  sensor  is  sent  to  all  the  processors  in  the  third  row  and  the  third 
column.  Each  processor  performs  multiplication  and  addition  of  two  sampled  data 
in  parallel  in  ail  the  processors  for  every  clock  cycle  .  Since  there  are  36  processors,  36 
multiplications  and  36  additions  are  performed  simultaneously.  Each  processor  has 
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5  =  0 
do  i=l,p 
S  =  S  +  A.B 
end  do 
S  =  S-Hp 


Figures. 1;  Systolic  architecture  for  computation  of  covariance  matrix 


a  memory  to  store  the  product  of  multiplication  which  is  added  to  the  product 

obtained  during  the  next  data  cycle.  Once  the  operations  of  multiplication  and 

addition  for  all  the  sampled  data  in  all  the  processors  is  performed,  the  stored  data 

in  each  processor  is  then  divided  by  the  number  of  samples  in  all  the  processors  in 
parallel.  The  resulting  output  are  used  to  form  the  data  covariance  matrix  R 

3.4:  SYSTOLIC  ARCHITECTURE  FOR  HOUSEHOLDER  TRANSFORMATIONS 

The  previously  comp’’'-ed  covariance  matrix  is  a  dense-  matrix.  Given  a  dense 
matrix  it  needs  to  be  reduced  to  a  tridiagonal  matrix  using  the  Householder 
transformations.  As  seen  fror.i  the  flow  chart  in  Figure  2.3  and  equation  (2.10)  the 
determination  of  all  the  d  s  and  the  new  elements  of  the  columns  of  the  matrix  can 
be  computed  in  parallel.  A  systolic  architecture  is  proposed  for  the  computation  of 
tridiagonal  matrix.  Thus  this  algorithm  can  be  mapped  on  a  systolic  architecture 
with  the  number  of  processors  equal  to  m-^l,  where  m  is  the  order  of  the  matrix. 
Systolic  arrangement  for  8  x  8  matrix  i*  3  shown  in  the  Figure  3.2.  The  columns  of 
the  matrix  are  sent  to  each  processor  in  a  pipelined  fashion  in  reverse  order  such 
that  the  last  element  of  the  column  becomes  the  first  element  of  the  column.  The 
Processor  PEI  is  used  to  find  the  w  and  c  required  by  other  processors  to  find  the  dj. 
Processors  PE2,  PE3...  PE8  are  used  to  find  ds  using  the  value  of  w  and  c  found  in  the 
first  processor.  All  the  d  s  are  evaluated  in  parallel  and  are  sent  to  the  processor  PE9. 
Processors  PE2,  PE3...  PE8  has  the  memory  to  store  all  the  w  s,  d  s  and  the  elements  of 
that  column  of  the  matrix.  The  processor  PE9  is  exclusively  used  for  the 
determination  of  Vs  using  d  s  and  Ws-  The  Vs  are  then  routed  back  to  all  the 
processors.  The  processors  PE2,  PE3...  PE8  uses  w,  d  and  v  to  find  the  new  values  of 
the  elements  of  the  columns  in  parallel.  At  the  same  time  the 


first  processor  is  used  for  the  evaluation  of  p.  The  first  element  of  the  first  column 
and  (3  are  the  output  of  the  first  iteration  which  are  used  as  input  for  evaluation  of 
eigenvalues  using  QR  method.  The  counter  is  used  to  set  the  number  of  iterations 
to  m-2.  For  m-2  times,  the  intermediate  results  are  used  in  feedback  loop  and  the 
same  set  of  processors  are  used  repeatedly.  The  feedback  loop  has  a  FIFO  memory  to 
temporally  store  all  the  elements  of  the  column  until  operations  on  previous 
iteration  are  completed.  For  the  first  iteration  operations  on  8  x  8  matrix  are 
performed  hence  all  the  processors  are  utilized.  For  the  second  iteration  operation 
on  7  X  7  matrix  are  performed.  Now  the  first  column  of  the  matrix  is  already 
computed;  therefore  new  elements  of  the  second  column  are  fed  back  to  PEI, 
elements  of  the  third  columii  are  fed  back  to  PE2  and  so  on.  Thus  for  the  second 
iteration  PE8  does  not  have  any  column  to  work  on  and  is  used  only  for  routing 
purposes.  All  other  processors  perform  same  operation  as  in  the  first  iteration,  but 
the  elements  of  each  column  are  reduced  by  one  element.  Thus  for  every  new 
iteration  the  columns  and  the  elements  of  the  columns  goes  on  reducing.  Various 
operations  performed  by  different  processors  are  given  in  Figure  3.3. 

3.5:  SYSTOLIC  ARCHITECTURE  FOR  QR  METHOD 

The  tridiagonal  matrix  as  obtained  from  the  Householders  transformations  is 
reduced  to  a  diagonal  matrix  using  the  QR  method  as  discussed  earlier.  The  QR 
algorithm  as  explained  in  the  pseudocode  is  described  again  in  the  form  of  parallel 
flowchart  of  Figure  3.4.  The  systolic  architecture  uses  2*(m+l)  processors  where 'm' 
is  the  number  of  rows  or  columns  of  the  tridiagonal  matrix.  Systolic  architecture 
for  8  X  8  matrix  is  shown  in  Figure  3.5.  Each  column  has  (m+1)  processors.  Two 
kinds  of  processors  are  used.  The  processors  PEI  and  PE2  are  used  to  compute  the 
eigenvalues  and  all  the  other  processors  are  used  to  find  the  eigenvectors  of  the 


Processor  PEI 


V  c 


1=1 

s=0 

do  l=8,i+lj-l 
v(l-l)=r(l) 
s  =  s  +  v(l-l)*v(l-l) 
end  do 
p=  sqrt(s) 
v(i)=r(i+l)  +  p 
c=  s+  r(i+l).p 
i=i+l 


^  d  c  V 


p=0 

do  L  =  8,i+l,-l 
p  =  p+v(L-l)*r(L) 
end  do 
p=p/2(c’*'c) 
do  L  8,1+ 1,-1 
v(L-l)  =  d-v(L-l)*p 
end  do 
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V  d  c  V 


d=0 

dol  =  8,i+l,-l 
d=d+r(l)*v(l-l) 
end  do 
is  V  in  ? 
do  l=8,i+l,-l 

r  (21 ) =r  ( 1 1 )  -  v(l )  V- V  (1 )  V 
end  do 

Processors  PE2.PE3,..PE8 


Processor  PE9 


Figure  3.3:  Operation  performed  by  different  processors, 
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Figure  3.4;  Parallel  Flowchart  for  QR  algorithm 


Diagonal 

elements 


Subdiagonal 

elements 
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tridiagonal  matrix.  The  diagonal  and  subdiagonal  elements  of  the  tridiagonal  matrix 

from  the  Householders  transformation  is  pipelined  into  the  first  processor  PEI 
which  performs  the  first  iteration.  The  new  values  of  sine  and  cosine  of  rotation  a  ^ 

and  b  ^  are  computed.  The  values  of  sine  and  cosine  are  sent  in  pipelined  fashion  to 

all  processors  in  that  column.  These  processors  PEll,  PE21....PE81  find  the 
eigenvector  of  the  tridiagonal  matrix.  The  new  values  of  a  s  and  bg  computed  in  the 
first  processor  are  sent  to  the  processor  PE2  which  performs  the  second  iteration  to 
find  the  eigenvalue,  performs  the  same  operation  as  performed  by  PEI.  The 
processors  PE12,  PE22...PE82  find  the  eigenvectors  for  the  second  iteration.  The 
processor  PE2  also  performs  the  test  for  the  convergence.  If  the  sum  of  squares  of 
the  subdiagonal  elements  is  not  nearly  equal  to  zero  then  the  control  will  go  into 
the  feedback  loop  to  repeat  the  computations.  This  processor  is  repeated  as  long  as 
only  the  diagonal  elements  are  left  which  form  the  eigenvalues.  Various  operations 
performed  by  the  processor  are  given  in  Figure  3.6. 

3.6:  HARDWARE  IMPLEMENTATION  OF  POWER  METHOD 

Once  the  eigenvectors  have  been  computed,  the  value  of  eigenvector  are 
utilized  to  calculate  the  power  as  given  by  equation  given  in  chapter  II  and  is 
repeated  below: 

P^(0)  = - - - 

a’^(e)  E  „E*  ^  a(e) 

As  seen  from  this  equation  the  evaluation  of  a*(0)  E  ^^E"^  ^  a(0)  requires  squaring  of 
the  product  of  rowvecici  7^  u(0)  and  the  pjgenvp^'^or  E  ^  Hence  the  product 


Processor  PEI,  PE2 


cl  si 


new 

al 

new 

bl 


x2  =  -sl.b2  +  cl  al 
r  =  sqrt(x2.x2+b2.b2) 
if  X  >  0 
cl  =  x2/r 
s  1  =  b2/r 
else 
cl  =  1 
si  =0 

w  =  cl.x2+sl.b2 
V  =  c0.cl.b2  +si.a2 
al  =  c0.cl.w+  sl.v 
bl  =  sl.w 


Processors  PE11,PE12, . PE81,PE82 


cl  =  cO  si  =  sO 
al  =  clxl  =  sl.aO 
pi  =  cl.aO-  sl.pO 


Figure  3.6:  Operation  performed  by  processors 


of  a(0)  and  E  n  is  evaluated  and  then  the  product  obtained  is  squared  and 
accumulated  for  all  the  values  in  the  array  manifold.  The  hardware  design  is 
shown  in  the  Figure  3.7.  It  consists  of  a  set  of  8  multipliers  to  find  the  product  of  a(6) 
and  E  n  in  parallel.  The  product  obtained  is  then  summed  using  adders.  The  real  and 
imaginary  and  are  squared  and  added  again.  The  evaluation  E*n^(6)  is  similar  to 
the  evaluation  of  a*(9)  E  „  and  hence  the  product  of  a*(0)  E  „  is  squared  and  added. 
The  angle  of  arrival  is  thus  calculated  for  different  values  of  the  angle.  The  value  of 
P  m  (0)  is  different  for  different  angle  The  angle  for  which  P  (0)  is  maximum  is 
the  angle  of  arrival  giving  the  angle  of  arrival. 

3.7:  HARDWARE  BLOCK  DIAGRAM  OF  MUSIC  AND  ESPRIT  ALGORITHMS 

The  hardware  block  diagram  for  the  MUSIC  Algorithm  is  shown  in  Figure 
3.8.  As  seen  in  this  Figure,  the  data  collected  from  the  sensors  is  utilized  to  form  the 
covariance  matrix.  The  eigendecomposition  is  performed  using  Householders 
transformation  and  QR  method.  The  Eigenvalues  are  used  to  find  the  number  of 
sources  and  finally  using  the  eigenvectors  in  Power  method  we  find  the  angle  of 
arrival.  The  Hardware  block  diagram  for  ESPRIT  algorithm  is  shown  in  Figure  3.9. 
It  is  similar  to  the  MUSIC  algorithm  but  instead  of  power  method  of  MUSIC  some 
more  computations  are  performed  to  evaluate  the  angle  of  arrival  in  case  of  FSPRIT. 


Figure  3.7:  Flowchart  for  power  method 


Figure  3.8:  Hardware  block  diagram  for  MUSIC  algorithm 


Figure  3.9:  Hardware  block  diagram  of  ESPRIT  algorithm. 


Chapter  IV 


THE  CORDIC  PROCESSOR  APPROACH 

4.1:  INTRODUCTION 

The  COORDINATE  ROTATION  DIGITAL  COMPUTER  (CORDIC) 
technique  introduced  by  Voider  [17]  is  highly  suitable  for  the  efficient 
computation  of  elementary  functions  like  multiplication,  division, 
trigonometric  functions,  logarithms,  exponentials,  square  root  and  vector 
rotation.  This  technique  has  been  proposed  by  number  of  researchers  [17-25]  for 
real  time  digital  signal  processing  applications.  Basically,  the  CORDIC  algorithm 
consists  of  a  set  of  iterative  plane  rotations  in  a  linear,  circular  or  hyperbolic 
coordinate  system,  depending  on  which  function  is  to  be  calculated.  The  only 
operations  required  for  the  above  functions  are  shifting,  adding,  subtracting  and 
the  use  of  look  up  tables  as  described  by  Voider  [17].  In  terms  of  hardware 
complexity,  CORDIC  processing  elements  are  quite  simple  and  consume  less 
chip  area.  Here  we  present  an  alternate  method  for  the  computation  of 
eigendecomposition  on  an  array  of  CORDIC  processors.  But  first  the  operating 
principles  of  the  CORDIC  processors  are  described. 

4.2:  THE  CORDIC  ALGORITHM: 

Consider  a  plane  coordinate  system  where  the  radius  R  and  the  angle  0  of  a 
vector  P(x,y)  as  defined  by  Walther  [18]  and  Ahmed  [23] 


R  =  V  x“  +my“ 

(4.1) 

0  =  ^tan  '(y/m  /x) 

/m 

(4.2) 
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and  m  =  1  for  circular  coordinate  systems, 
m  =  0  for  linear  coordinate  systems, 
m  =  -1  for  hyperbolic  coordinate  systems. 

Note  that  for  m  =  -1,  Equation  (4.2)  defines  a  complex  quantity,  which  can  be 
simplified  by  rewriting  Equation  (4.2)  in  terms  of  hyperbolic  trigonometric 
functions  [18]. 

After  one  rotation,  the  new  vector  Pi+i  =  (x-^|,yj^p,  as  shown  in  Figure  4.1 
obtained  from  Pi  =  (x-,yi)  can  be  expressed  as 

X.  ,  =  X.  -  m  o.  5.  y.  (4.3) 

and 

y.  ,  =  y  +  a  5.  X.  (4.4) 


where, 

ai=  +1  for  counter  clockwise  direction  of  rotation  and 
aj=  -1  for  clockwise  direction  of  rotation. 

5  =  Integral  power  of  the  machine  radix  given  by 

2'{i-2)  where  'i'  is  the  iteration  number  of  the  current  CORDIC  iteration, 
m  =  Parameter  for  the  coordinate  system. 

After  n'  iterations,  the  new  radial  and  angular  components  of  the  vector  P  are 
given  by 
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Figure  4.1:  Origir\al  vector  P  and  the  same  vector  after 
rotation  by  ±a  [17]. 


On  =  Sq  +  a 

(4.5) 

and 

Rn  =  Ro  *  K 

(4.6) 

where 

a  and  K  are  given  by  the  following  relations  [23] 

n- 1 

a  =  ^  o.  tti 
1=0 
n- 1 

=  (o,  /  Jm  )  tan~‘(5;7m  )  (4.7) 

1=0 

n- 1  n- 1  _ 

K=n =  riv (4.8) 

1=0  1=0 


where, 
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a  =  (1  /  Vm )  tan  '(/m  5^) 
and 

K.  =  7l+ni8' 

For  n  =  24  ar\d  m=l,  the  value  of  K  will  be  1.646760255 

A  third  variable  'Z’  is  provided  to  keep  track  of  the  amount  of  the  angle 
variations; 

Zi+i  =  Zi  +  tti  (4.9) 

Solving  the  set  of  difference  Equations  (4.3),  (4.4;,  and  (4.9),  for  'n'  iterations  [18] 
gives 


=  K  {  Xq  cos(a  Jm )  +  y(/m  sm(aJm ) } 

(4.10) 

=  K  { sin(a/m )  +  cos((x/m ) } 

(4.11) 

=  Zo  +  a 

(4.12) 

These  relations  are  summarized  in  Figure  4.2. 

The  scale  factor  K  which  is  inadvertently  generated  by  the  CORDIC  cycles  is 
not  useful  and  is  unwanted  at  the  output.  This  K  can  be  normalized  to  unity  as 
suggested  by  many  authors.  It  was  suggested  by  Haviland  and  Tuszynski  [19]  to 
insert  scaling  cycles  into  the  CORDIC  iteration  which  scale  the  magnitude  of  the 
vector  being  rotated  by  the  5j  used  in  the  CORDIC  iteration  preceeding  the  current 

iteration.  For  n'  CORDIC  iterations,  approximately  n/2  scaling  cycles  are  needed 
in  order  to  cause  the  scale  factors  to  converge  to  unity,  thus  imposing 


X 


X 

Y 


X 


X 

y 

z 


Kj (xcosz-ysinz) 
(ycosz+xsinz) 

0 


Circular  m=l, 
z  tends  to  0 


Linear  m=0, 
z  tends  to  0 


X 

y 

z 


K  j (xcoshz+ysinhz) 
K_  ^(ycoshz+xsinLz ) 
0 


Hyperbolic 
m=-l,  z  tends  to  0 


z-tan~^(y/x) 


Circular  m=l, 
y  tends  to  0 


Linear  m=0 
y  tends  to  0 


Hyperbolic 
m=-l,  y  tends  to  0 


Figure  4.2;  Input  output  functions  for  CORDIC  modes  [18] 


a  speed  penalty.  It  has  also  been  suggested  by  Ahmed,  Ang  and  Morf  [20]  to 
combine  a  scaling  cycle  with  the  CORDIC  iteration  preceding  the  current 
iteration.  This  procedure  overcomes  the  speed  penalty,  but  at  the  expense  of 
chip  area.  Lange  et  al  [21]  suggested  the  implementation  of  the  normalization 
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factor  for  K  together  with  the  floating  point  postprocessing  at  the  output.  This 
normalization  factor  introduces  an  acceptable  relative  error  of  2  . 


4.3:  CORDIC  FUNCTIONS  AND  ACCURACY: 

The  value  'n'  in  the  above  expressions  gives  the  number  of  CORDIC 
iterations  per  operation.  This  is  set  by  the  word  length  or  the  number  of  bits  in 
the  inputs  x  and  y,  which  directly  gives  the  number  of  iterations.  The  number 
of  bits  is  also  an  index  of  the  CORDIC  operation,  and  accordingly,  the  number  of 
bits  may  be  chosen  for  a  required  accuracy.  However,  there  is  a  upper  limit 
constraint  on  having  more  number  of  bits  and  thus  more  number  of  iterations 
since  it  results  in  slower  operation  and  consumes  more  chip  area.  Hence,  there 
is  a  trade-off  in  the  selection  of  word  length.  The  rotations  are  performed 
iteratively  in  many  small  steps  rather  than  in  one  single  step  or  one  rotation 
because  the  sequence  of  angles  {a .}  are  so  chosen  that 

=  tan  (a./rn )  /  /in  =  2  ^  ^ 

is  an  integral  power  of  the  machine  radix  (which  is  2  in  our  case  since  we  are 
using  binary  number  representation).  The  multiplication  by  6i  in  Equations 
(4.3)  and  (4.4)  may  therefore  be  implemented  as  a  shift  operation,  one  shift  per 
iteration.  Note  that,  for  every  iteration,  we  get  one  bit  of  the  final  answer. 
Hence,  we  neither  require  a  multiplication  facility  nor  evaluate  sines  and 
cosines  of  arbitrary  intermediate  angles.  Figure  4.2.  illustrates  two  special  cases: 

1 )  y  is  forced  to  zero  :  yn  =  0 

2)  z  is  forced  to  zero  :  Zn  =  0 

These  two  cases  generate  many  elementary  functions.  Whether  yn  or  Zn  is  forced 
to  zero  can  be  taken  care  of  by  the  proper  choice  of  o  or  the  direction  of  rotation. 
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The  identities  given  in  the  foot  note  are  used  to  simplify  the  above  results.  It 
can  be  seen  that  by  proper  choice  of  the  initial  values  the  following  functions 
can  be  obtained  or  generated  [18]. 
x^z,  y/x, 
sin  z,  cos  z, 
sinh  z,  cosh  z, 
tan'l(y/x),  tanh'^(y/x) . 
tan  z  =  sin  z  /  cos  z 
tanh  z  =  sinh  z  /  cosh  z 
exp.z  =  sinh  z  =  cosh  z 
logn  w  =  2tanh-l  [y/ x] 
where  x  =  w+1  and  y  =  w-1 

=Vx--y- 

where  x  =  w  +  l/4  and  y  =  w-l/4 

Foot  note;  Mathematical  identities^ 

Let  i  =  -Ta 


z  =  lim  i//m  sin(  z/rn  )  II 

m-»0 

z  =  lim  l/^rh  tan  ‘ (  z/rh  )  12 

m->0 

sinh  z  =  -i  sin(  i  z  )  13 

cosh  z  =  cost  i  z  )  14 


tanh  z  =  -i  tan' t  i  z  ) 


15 
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4.4:  THE  CORDIC  ITERATION: 

In  a  CORDIC  iteration,  the  new  value  of  the  components  x  and  y  of  any 
vector  P  is  obtained  by  adding  or  subtracting  a  shifted  value  of  one  component 
to  the  other.  The  components  are  shifted  by  (i-2)  bits  to  the  right,  where  i  is  the 
iteration  number,  for  every  iteration  and  then  cross  addition  or  subtraction  is 
performed.  These  iterations  are  called  microrotations,  and  every  microrotation 
produces  one  'bit'  of  the  answer.  Rewriting  Equations  (4.3)  and  (4.4),  we  have, 

-(i-2) 

X.  ,  =  X  -  m  a.  2  V.  (4. 13) 

^-^i  1  I  - 1  ' 

-(i-2) 

+  Oi2  X.  (4.14) 


It  can  be  seen  that  y^^^  is  obtained  by  adding  a  shifted  value  of  to  yj^  and 
similarly,  Xj^j  is  obtained  by  adding  a  shifted  value  of  y.  to  x.- 


By  restricting  the  angular  increment  a  to 


a.  =  tan 
1 


2-(i-2) 


or  in  general. 


the  right  hand  side  of  Equations  (4.13)  and  (4.14)  can  be  obtained.  In  CORDIC,  the 
angles  are  represented  as  binary  fractions  of  a  half  revolution,  with  the  negative 
angles  represented  by  2's  complements  [22].  The  angle  for  the  first  microrotation 
is  taken  to  be  90^-  To  satisfy  aj  =  90°,  and  a  =  ±1,  the  condition  is 

-180  <  0  <  +  180, 

which  is  taken  care  of  by  the  representation  as  shown  in  Figure  4.3. 


7  1 


1 

1. 


-135® 

I.V.. 

-90® or  270® 

1.10 

Figure  4.3:  Binary  representation  of  angles  in  CORDIC  [22] 


4.5;  THE  ANGLE  AND  ROTATION  COMPUTING  SEQUENCE 

The  CORDIC  operation  is  explained  by  an  example  and  is  illustrated  in 
Tables  4.1  and  4.2.  Table  4.1  shows  how  the  angle  is  computed  '■nd  Table  4.2 
shows  how  the  rotation  is  performed.  The  first  rows  of  the  Tables  show  the 
initial  values  of  the  X,Y  and  Z  registers.  The  X  and  Y  registers  have  the  matrix 
elements  to  be  rotated  as  the  initial  values.  In  this  example,  the  word  length  of 
the  three  registers  are  chosen  to  be  8  bits  with  the  MSB  being  the  sign  bit. 
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4.5.1;  Angle  computation; 

In  Table  4.1,  the  initial  value  of  Z  is  set  to  zero.  The  sign  bit  of  Y,  is  used  as 
the  reference  to  determine  the  operation  sign  (whether  addition  or  subtraction) 
to  obtain  Yj+i,  Xi+i&  Z[+\  where  i  is  the  index  of  the  CORDIC  iteration  or 
microrotation  as  shown  below. 

[1]  Sign  bit  of  Y.  is  'O'  which  is  'positive' 

shifted  value  of  Y.  (addition) 

Y-^^  =  -  shifted  value  of  (subtraction) 

=  Zj  +  Oj  (addition) 

[2]  Sign  bit  of  Y.  is  '1'  which  is  'negative' 

Xj^i  =  Xj  -  shifted  value  of  Yj  (subtraction) 

shifted  value  of  Xj  (addition) 

Zj^i  =  Z.  -  Oj  (subtraction) 


For  example,the  sign  bit  of  Y3  is  0  indicating  positive.  The  operation  sign  for 
Z  and  X  is  that  is  the  value  of  Z4  is  obtained  by  'adding'  the  value  of  to  Z3 
and  the  value  of  X4  is  obtained  by  adding  the  value  of  X3  to  the  shifted  value  of 
Y3  The  operation  sign  for  Y  is  the  negative  of  the  sign  bit  and  thus,  Y4  is 
obtained  by  subhacting  the  shifted  value  of  X3  from  Y3  facilitating  Y  to  converge 
to  zero. 


As  mentioned  earlier,  t^he  angle  ai  is  chosen  to  be  90°  After  the  first 
microrotation,  the  new  value  of  X  is  the  previous  value  of  Y  and  the  new  value 
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Y  is  the  negative  of  the  previous  value  of  X  which  conforms  to  Equation  (4.10). 
The  second  miciorotation  is  performed  by  an  angle  tan'^  (2-0-2))  which  is  tan'^  (1) 
since  i=2  and  thus  there  are  no  shifts  but  only  cross  addition  or  subtraction.  The 
number  of  shifts  is  given  by  the  exponents  of  2  which  is  the  argument  of  the 
arctan  as  seen  in  the  second  column  of  the  Table.  Since  the  angle  steps  are  given 
by  the  arctan  whose  arguments  are  controlled  by  the  iteration  number,  for  any 
given  microrotation,  the  angle  is  fixed  irrespective  of  the  input  values.  Thus  the 
angle  steps  need  not  be  calculated  every  time  but  instead,  the  binary  CORDIC 
representation  as  shown  in  Figure  4.3  of  the  angle  steps  are  stored  in  the  form  of 
look-up  tables  and  recalled  at  the  beginning  of  every  iteration.  Since  seven  bita 
are  used  to  represent  X  and  Y,  seven  CORDIC  iterations  have  to  be  performed  for 
one  operation.  At  the  end  of  seven  iterations,  the  contents  of  the  Z  register  gives 
the  original  angular  argument  of  the  co-ordinate  components  Yjand  Xj.  In  our 
application,  only  the  value  of  the  angle  is  necessary  and  thus  the  X  and  Y  outputs 
are  ignored.  This  computed  angle  is  fed  as  the  Z  input  as  shown  in  Figure  4.4  for 
the  rotation  computation  described  below. 
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TABLE  4.1 


Sequence  of  steps  for  computing  angle 


Angle  step 

Shift  terms 

Z-Register 

Y-Register 

X-Register 

Index  #  i 

90 

tari^  oo 

0.0000000 

0.0101110 

1.1000101 

1 

+0.1000000 

-1.1000101 

+0.0101110 

45 

tan^  1 

0.1000000 

+0.0100000 

0.0111011 

-0.0101110 

0.0101110 

+0.0111011 

2 

26.56 

tah^  2^ 

0.1100000 

+0.0010010 

0.0001101 

-0.0110100 

0.1101001 

+0.0000110 

3 

14.04 

tah^  1 

0.1110010 

-0.0001001 

1.1011001 

+0.0011011 

0.1101111 

-1.1110110 

4 

7.125 

tan^  i' 

0.1101001 

-C.OOOOIOI 

1.1110100 

+0.0001111 

0.1111001 

-1.1111110 

5 

3.576 

-1 

tan^  2 

0.1100100 

+0.0000010 

0.0000011 

-0.0000111 

0.1111011 

+0.0000000 

6 

1.7899 

-1  -5 

tan^  2 

0.1100110 

-0.0000001 

1.1111100 

+0.0000011 

0.1111011 

-1.1111111 

7 

148.18 

=  0 

0.1100101 

=  0 

1.1111111 

0.1111100 

=  R 

8 

The  index  #  of  the  x,  y  and  z  register  values  are  indicated 
in  the  last  column  of  the  table. 


Angle  Processor 


Rotation  Processor 


(c=0) 


a  cos  8  -  b  sin  8 
a  sin  8  +b  cos  8 
0 


m=l,  b 


0 


m=l,  0 


0 


Figure  4.4;  Calculation  of  angle  6,  by  which  the  two  vectors 
'a'  and 'b' are  rotated. 


4.5.2:  Rotation  computation: 

The  sequence  of  steps  are  shown  in  Table  4.2.  Here,  the  sign  bit  of  Zk  is  used 
as  the  reference  to  determine  the  operation  sign  as  shown  below. 

[1]  Sign  bit  of  Z,  is  ’O'  which  is  'positive' 

^i+l  ^  *  shifted  v'alue  of  (subtraction) 

Yj^^  =  +  shifted  value  of  Xj  (addition) 

Zj^^=Zj-aj  (subtraction) 

[2]  Sign  bit  of  Zj  is  T  which  is  'negative' 

-  Xj  +  shifted  value  of  Y^  (addition) 

Y|^^  =  Y|  -  shifted  value  of  X^  (subtraction) 

Z^  ^  -  Z.  -h  a.  (addidon) 

The  operation  sign  for  Y  is  the  sign  indicated  by  the  sign  bit  of  Z  and  the 
operation  sign  of  X  and  Z  is  the  neg.itive  of  the  sign  bit  of  Z.  After  7  iterations, 
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the  value  of  Z  is  zero,  meaning  that  the  rotation  has  been  performed  by  the 
complete  angle  9  until  8  =  6  ±  a  =0.  Similar  steps  of  the  angles  a’s  are  uspH  ;i<; 
in  the  angle  module.  The  contents  of  the  X  and  Y  registers  after  7  microrotations 
are  the  rotated  values  of  the  inputs  to  the  X  and  Y,  by  the  angle  given  by  the  Z 
register.  The  Z  output  is  ignored. 

TABLE  4.2 

Sequence  of  steps  for  computing  rotation 


I 

I 

s 

I 

I 

I 


Angle  step 

Shift  terms 

Z-Register 

Y-Register 

X-register 

Index  #  i 

0  =  142.18 

0.1100101 

0.0101110 

1.1000101 

1 

0 

90 

tan  ^  (°o) 

-0.1000000 

+1.1000101 

-0.0101110 

0 

45 

tan'^(l) 

0.0100101 

-0.0100000 

1.1000101 

+1.1010010 

1.1010010 

-1.1000101 

2 

26.56*’ 

tan'^(2*S 

0.0000101 

-0.0010010 

1.0010111 

+0.0000110 

0.0001101 

-1.1001011 

3 

4.04*’ 

-1  -2 
tan  (2  ) 

1.1110011 

+0.0001001 

1.0011101 

-0.0010000 

0.1000010 

+1.1100111 

4  . 

7.125° 

.1  -3 
tan  ^  (2  ) 

1.1111100 

+0.0000101 

1.0001101 

-0.0000101 

0.0101001 

+1.1110001 

5 

0 

3.576 

-1 

tan  ^2  ) 

0.0000001 

-0,0000010 

1.0001000 

+0.0000001 

0.0011010 

-1.1111000 

6 

1.7899® 

-1  -5 
tan  (2  ) 

1.1111111 

+0.0000001 

1.0001001 

-0.0000001 

0.0100010 

+1.1111100 

7 

0 

0 

0.0000000 

1.0001000 

0.0011110 

8 

=6 

=0 

The  index  i'  of  the  x,  y  and  z  register  values  are 
indicated  in  the  last  column  of  the  table 
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Thus  an  operation  takes  'i'  iterations,  i  being  the  word  length  and  it  can  be 
seen  that  in  each  iteration  only  shifting  and  addition  /  subtraction  are 
performed. 


4.6:  THE  QR  ALGORITHM 

The  QR  algorithm  has  already  been  described  in  chapter  II.  This  requires  the 
computation  of  the  following  equation,  rewritten  for  quick  reference. 

T  =0^0^  O^TO  OO 

k+l  ^n-l^n-2  k^l  ^n-2'^n 

The  algorithm  is  explained  using  an  example  of  a  5  X  5  symmetric,  tridiagonal 
matrix  T^ 

4,6.1:  Computation  of  eigenvalues: 

Eigenvalues  are  computed  in  accordance  with  the  above  equation  and  k 
iterations  will  be  performed.  One  such  iteration  is  shown  in  the  following 
example. 


“  1 

1 

0 

0 

0  ■ 

COS0 

sin0 

0 

0 

0  ‘ 

1 

1 

1 

0 

0 

-sin0 

COS0 

0 

0 

0 

0 

1 

2 

1 

0 

Qi^= 

0 

0 

1 

0 

0 

0 

0 

I 

2 

1 

0 

0 

0 

1 

0 

-  0 

0 

0 

1 

2  - 

-  0 

0 

0 

0 

1  - 

0i  =  tan  '  ^  =  26.565^' 
z 
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0  is  calculated  based  on  the  Given’s  rotation.  The  arguments  for  the 
arctan  are  obtained  from  the  first  column  of  the  window  selected  for 
the  operation. 


"  0.8944 

0.4472 

0 

0 

0  " 

“  2.236 

1.7888 

0.4472 

0 

0  " 

-0.4472 

0.8944 

0 

0 

0 

0 

1.34166 

0.8944 

0 

0 

0 

0 

1 

0 

0 

q|t,= 

0 

1 

2 

1 

0 

0 

0 

0 

1 

0 

0 

0 

1 

2 

1 

0 

0 

0 

0 

1  _ 

-  0 

0 

0 

1 

2  _ 

Gt— tan  j  34 =  36.6988 

1.7888  0.4472  0  0 

1.6733  1.9123  0.5976  0 
0  1.0691  0.8018  0 

0  1  2  1 
0  0  12 


T 

Q2H 


1  0  0  0  0 

0  0.8018  0.5976  0  0 

0  -0.5976  0.8018  0  0 

0  0  0  1  0 

L  0  0  0  0  1 


qIq|t,=i 


2.236 
0 
0 
0 

L  0 


Observe  that  the  matrix  T|  is  being  transformed  step  by  step  to  a  upper 

T 

triangular  matrix  through  row  operations  by  the  orthogonal  matrices  Q  . 


0,  =  tan' 


1.0691 


43.088 


Now  the  first  two  columns  are  free  and  we  can  perform  the  column 
operation  on  the  new  Tj  by  the  matrix  Q. 
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“  0.8944 

-0.4472 

0 

0 

0  “ 

“  2.8 

0.6 

0.4472 

0 

0  “ 

0.4472 

0.8944 

0 

0 

0 

0.7483 

1.5 

1.9124 

0.5976 

0 

0 

0 

1 

0 

0 

qIq|t,q,= 

0 

0 

1.0691 

0.8018 

0 

0 

0 

0 

1 

0 

0 

0 

1 

2 

1 

-  0 

0 

0 

0 

1  - 

-  0 

0 

0 

1 

2  - 

“  1 

0  0 

0 

0  “ 

"  2.8 

0.6 

0.4472 

0 

0 

0 

1  0 

0 

0 

qIqIq[T|Q,= 

0.7483 

1.5 

1.9124 

0.5976 

0 

=  0 

0  0.7303 

0.6831 

0 

0 

0 

1.4639 

1.9518 

0.6831 

0 

0  -0.6831 

0.7303 

0 

0 

0 

0 

0.9135 

0.7303 

-  0 

0  0 

0 

1  - 

-  0 

0 

0 

1 

2 

0,  =  tan 


1  1 


0.9135 


=  47.588® 


qIqIq|t,q,Q;  = 


2.8 

0.7483 

0 

0 

0 

0.7483 

2.345 

0.637 

0.5976 

0 

0 

0.8748 

1.1737 

1.9518 

0.68312 

0 

0 

0 

0.9135 

0.73031 

0 

0 

0 

1 

2 

“10  0 
0  1  0 
0  0  1 
0  0  0 
-000 


0  0  “ 

0  0 

0  0 

0.6744  0.7383 

-0.73832  0.67445  - 


qIqIqIqIt,q,q, 


2.8 

0.7483 

0 

0 

0  “ 

0.7483 

2.3455 

0637 

0.5976 

0 

0 

0.8748 

1.1737 

1.9518 

0.6831 

0 

0 

0 

1.35443 

1.9692 

0 

0 

0 

0 

0.8097  - 
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Q4QIQ2QiT,Q,Q2Q3 


2.8  0.7483  0 

0.7483  2.3455  0.8748 

0  0.8748  2.1905 

0  0  0.9252 

0  0  0 


0  0  " 

0  0 

0.6236  0.6831 
0.9891  1.9692 
0  0.8097  - 


t2-qJqIqIqTt,q,q,q3Q4= 


2.8 

0.7483 

0 

0 

0 


0.7483 

2.3455 

0.8748 

0 

0 


0  0  0  “ 
0.874  0  0 

2.1905  0.925  0 

0.925  2.121  0.598 
0  0.598  0.5461  - 


The  above  matrix  is  the  result  of  one  QR  iteration.  Observe  that  it  is 
symmetric  and  tridiagonal.  will  always  be  symmetrical  and  tridiagonal 
as  long  as  the  original  matrix  T|  is  symmetric  and  tridiagonal.  When  more 
iterations  are  carried  out,  the  off-diagonal  entries  converge  to  zero  resulting 
in  a  diagonal  matrix.  The  diagonal  elements  will  be  the  eigenvalues  of  the 
original  matrix  as  suov'^ti  below. 


Xj  0  0  0 
0  0  0 


0 


0  0 


0  0  0 


0 

0 

0 

0 


0  0  0  0  X5 


4.6.2:  Computation  of  eigenvectors; 

The  product  of  ail  the  matrices  which  were  used  for  the  eigenvalue 
computation,  gives  the  eigenvectors  of  the  matrix  T.  Instead  of  performing 
matrix  multiplications  which  is  both  time  and  area  consuming,  all  the 
operations  are  performed  starting  with  an  identity  matrix,  thus  obtaining  the 
same  results.  One  iteration  of  these  computations  are  shown  as  an  example 
below. 


qJ  I  = 


0.8944  0.4472  0  0  0 
-0.4472  0.8944  0  0  0 


0  0  1 


1  0  0  0  0 

0  0.8018  0.5796  0  0 

0  -0.5796  0.8018  0  0 

0  0  0  I  0 


-  0.8944  0.4472  0  0  0 

-0.3585  0.7171  0.5976  0  0 

qTqIi^  0.2672  -0.5345  0.8018  0  0 

0  0  0  1  0 


0  10  0  0 

0  0  0.7303  0.6831  0 

0  0  -0.6831  0.7303  0 

0  0  0  0  1 


“  0.8944  0.4472  0  0  0 

-0.3585  0.7171  0.5796  0  0 

0l02  Qh=  -0.3903  0.5856  0.6831  0 

-0.1826  0.3651  0.5477  0.7303  0 

-  0  0  0  0  1 


qC 


0  0  0  0.6744  0.7383 
0  0  0  -0.7383  0.6744 
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"  0.8944 

0.4472 

0 

0 

0  " 

-0.3585 

0.7171 

0.5976 

0 

0 

=  q]  qI  qI  I  = 

0.1952 

0.3903 

0.5856 

0.6831 

0 

-0.1231 

0.2463 

0.2463 

0.4926 

0.7383 

-  0.1348 

-0.2696 

-0.4044 

-0.5392 

0.6744  _ 

.th 

X  is  the  eigenvector  matrix  with  the  j  column  being  the  eigenvector 
corresponding  to  the  eigenvalue  X.j 


4.7:  STEP  BY  STEP  SUB-ARRAY  SELECTION  AND  PROCESSING 

As  explained  in  the  previous  sections  the  eigendecomposition  by  the  QR 
method  is  performed  as  a  sequence  of  row  operations  and  column  operations 

Qj,  where  i  =  1,2,3, . ,  (m-1)  and  m  is  the  order  of  the  input  matrix.  Since  we 

have  eight  sensors,  the  input  matrix  is  of  the  order  eight  and  there  are  seven  row 
operations  and  seven  column  operations.  Since  the  original  matrix  has  been 
transformed  to  a  tridiagonal  form,  we  have  to  process  at  most  three  elements  in 
every  row  or  column.  Each  operation  is  performed  on  two  rows  i  and  (i+l). 
Every  Qj  operation  is  done  on  two  columns  j  and  (j+1).  Thus  any  Qj^  or  Qj 
operation  produces  at  most  six  new  matrix  entries  and  we  need  to  process  only 
these  six  elements.  For  the  row  operation  this  is  achieved  by  selecting  a  2  X  3 
window  at  the  top  left  of  the  matrix  as  shown  in  Figure  4.5.  For  every 
consecutive  operation,  we  slide  down  the  window  diagonally  by  one  row  and 
one  column  until  the  end  of  the  matrix  is  reached.  The  last  row  operation 
changes  only  four  elements. 
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Similarly,  for  the  column  operations,  we  select  a  3  X  2  window  and  slide  it 
down  diagonally  until  the  end  of  the  matrix  is  reached.  This  performs  to 
operations  and  is  shown  in  Figure  4.6.  Note  that  the  Qi  operation  creates  only 
four  new  elements  and  thus  only  two  of  the  three  CORDIC  processors  do  the 
operation  and  the  third  processor  is  fed  with  zeros  as  inputs. 

For  a  given  i,  both  Qj^  and  Qj  use  the  same  angles.  Thus  if  all  the  row 
operations  are  carried  out  first  and  then  followed  by  the  column  operations,  all  the 
angles  in  one  iteration  have  to  be  stored  and  have  to  be  used  in  the  right  order 
again.  To  circumvent  this  drawback,  the  row  and  column  operations  are 
interleaved.  Another  advantage  of  interleaving  is  that  when  the  angle  processor 
is  calculating  6j  for  the  operation  the  rotation  processors  simultaneously 
perform  the  Qj.2  operation  as  explained  in  the  next  section. 
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Figure  4.0:  Selection  of  windows  for  the 
(J  operation  for  eigenvalues 
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Figure  4.7:  First  two  columns  ready  for  column  operation 
Note:  indicates  that  those  elements  have  been  rotated  and  their  values 

are  different  from  the  original  matrix. 

After  Qi^  and  Q2^  have  been  performed,  the  first  two  columns  of  the 
resulting  matrix  are  free  as  shown  in  Figure  4.7  and  are  available  for  the 
operation.  Meanwhile  we  calculate  the  angle  for  Q3^  by  computing  tan'^  (b3/a3). 
This  process  is  continued  and  when  all  the  operations  are  completed,  a  new 
iteration  is  performed  starting  from  row  1  or,  a  new  matrix  is  loaded  row  by  row 
if  the  required  number  of  x'  iterations  are  already  performed  on  the  previous 
matrix.  Since  the  loading  of  the  matrices  is  done  row  by  row,  the  and  Q7 
operation  on  the  previous  matrix  can  be  performed  without  being  affected  by  the 
new  iteration  or  new  matrix.  After  Q7,  we  start  with  on  the  next  iteration. 
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This  is  explained  with  the  flowchart  shown  in  Figure  4.11. 

The  eigenvectors  are  computed  by  the  product  of  all  the  Q,^.  To  avoid 
matrix  multiplications,  we  start  with  an  identity  matrix  I  and  perform  only  the 
Q3^  operations  on  I.  To  reduce  the  number  of  processors  for  the  eigenvector 
computation  by  half,  the  row  operations  are  done  in  two  parts.  2X4  windows 
are  selected  separately  for  the  left  half  and  the  right  half  of  the  rows  as  shown  in 
Figure  4.8.  Though  we  start  with  only  a  few  non-zero  elements  in  the  matrix,  we 
need  m/2  =  4  processors  because  every  step  and  every  iteration  produces  new 
entries  in  the  resulting  matrix. 
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Figure  4.8;  Selection  of  windows  for  the 
q\  operation  on  the  identity  matrix  for 
the  eigenvectors 
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4.8:  DESCRIPTION  OF  THE  PARALLEL  HARDWARE  BLOCK  DIAGRAM 

A  parallel  CORDIC  block  organization  is  shown  in  Figure  4.9.  Eigenvalues 

and  eigenvectors  are  computed  in  parallel  by  different  sets  of  processors.  The 

optimized  architecture  consists  of  eight  processors.  The  processor  Pq  computes 

the  angle  6j.  The  processors  P|,  P2  and  P3  compute  the  Qj^  and  Qi.2  operations  for 

the  eigenvalues  in  alternate  time  steps.  Only  four  processors  are  required  to 

compute  eigenvalues  irrespective  of  the  dimension  of  the  matrix  as  long  as  m>4. 

In  the  eigenvector  module,  processors  P4  to  P7  perform  the  operation 

starting  from  the  identity  matrix.  The  number  of  CORDIC  processors  for 

eigenvector  computation  equals  m/2.  Thus,  the  total  number  of  processors 

m 

required  to  compute  both  eigenvalues  and  eigenvectors  are  (4  +  -^ ). 

The  input  matrix  is  fed  row  wise  into  the  memory.  The  relevant  windows 
are  selected  with  the  help  of  a  controller  which  contains  a  simple  code.  The  2X3 
and  3X2  windows  are  selected  alternately  and  passed  to  the  input  registers. 
Three  parallel  operations  performed  by  processors  P^  to  P3  are  demonstrated  by 
the  following  Example. 

Example: 

Assume  that  Qi^,  Q2^/  Q3^andQi  operations  have  been  performed.  The  next 
operation  that  can  be  performed  is  Q2without  affecting  the  row  operations  and  at 
this  instant  the  second  and  third  columns  will  be  available  to  perform  Q2 
operational  shown  in  Figure  4.10 

fl]To  perforni  the  Q2  operation,  we  select  a  3  X  2  window  from  the  second 
and  third  columns.  This  window  is  passed  to  the  processors  P]  to  P3.  These 
processors  contain  the  angle  02  received  from  the  output  of  a  two  stage  FIFO 


Figure  4.9:  Parallel  hardware  block  diagram  of  a  CORDIC 
block  for  the  computation  of  QR  iterations 


j 
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connected  to  P^y 

12]  In  the  eigenvalue  conaputation  mode,  the  fourth  2X3  window  is  being 
selected  and  the  first  column  of  this  window  (which  are  the  elements  b4  and 
a4)  is  sent  to  Pq  to  compute  64. 

[3]  At  the  same  time,  P4  to  P7  contain  the  third  window  (right  half  as  shown 
in  Figure  4.7)  and  contain  the  angle  63  received  from  the  angle  processor 
through  a  one  stage  FIFO.  It  can  be  seen  that  the  operations  that  are 
performed  in  parallel  during  this  time  step  are 

[1]  Angle  computation  —  64 

[2]  Q2  on  columns  2  and  3  in  parallel  for  eigenvalues  performed  by  P^  to  P3. 

[3]  Q3^on  rows  3  and  4  (right  half)  in  parallel  for  eigenvectors  performed  by 
P4  to  P7 
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■  Figure  4.10:  Shows  the  window  selected  for  the  row  operation 

1  Q4  while  the  columns  2  and  3  are  free  for  Q2 

operation.  | 

N'ote:  ■*'  indicates  the  elements  which  have  been  rotated  and  updated. 


89 


Now,  we  have  Qi^Q2^Q3^TQ|Q2  as  the  new  matrix  and  the  angle  84  is 
passed  to  processors  to  P3  and  P4  to  P7,  where  Q4^  will  be  performed 
simultaneously.  The  operations  that  are  performed  in  parallel  during  this  time 
step  are 

[1]  Q4^on  rows  4  and  5  performed  by  Pj  to  P3. 

(2|  Q4^  on  rows  4  and  5  (left  half)  performed  by  P4  to  P7. 

The  sequence  of  operations  are  explained  in  the  next  section. 


4.9;  PRECEDExNCE  AND  PARALLELISM  OF  THE  COMPUTATIONS 

The  parallel  implementation  of  the  QR  decomposition  on  the  CORDIC 
processors  is  described  with  the  aid  of  the  flowchart  of  Figure  4.11.  Since 
eigenvalues  and  eigenvectors  can  be  computed  utilizing  the  same  rotation  angle 
9,  they  are  shown  separately  with  two  parallel  paths. 

4.9.1:  Eigenvalue  computation 

At  the  beginning  of  the  computation,  the  T  matrix  memory  is  initialized  to 
0.  Next,  the  first  row  of  the  input  T  matrix  is  loaded  into  the  memory.  The 
counters  are  initialized  to  i=l,  j=m-2  and  k=L  Then  a  3  X  2  window  on  columns 
j  and  j+1  which  are  6  and  7  is  selected  and  passed  to  the  processors  Pj  to  P3  for  the 
Q5  operation.  Since,  as  of  now,  only  one  row  has  been  loaded  into  the  memory, 
the  rest  of  the  rows  are  still  zeros  and  this  Cooperation  does  not  produce  any 
change  in  the  matrix.  After  the  window  has  been  passed,  the  next  row  of  T  is 
loaded  into  the  memory.  Now  we  select  a  2  X  3  window  on  T  and  pass  the  first 
column  of  the  window  to  the  processor  Pq  for  computing  Next,  P^  will  be 
computing  while  P]  to  P3  will  be  computing  Q^.  The  angle  for  this  Cooperation 
is  indeterminate  since  it  is  just  the  start  of  the  operation  and  also 
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P|  to  P3  will  have  zeros  as  inputs  and  produce  zeros  as  outputs  for  any  angle. 
Next,  the  previously  selected  2X3  window  is  passed  to  the  processors  P]  to  P3 
and  Q|^  is  performed  by  the  angle  received  from  Pq.  The  counters  i  and  j  are 
incremented  by  one. 

Now  j=7  which  is  less  than  8  (m=8)  the  operation  is  repeated  for  Qj.  Also 
i=2  which  is  less  than  m,  so  the  above  procedure  is  repeated  for  Q2^.  On  the  next 
incrementation  of  i  and  j,  i  becomes  3  and  j  becomes  8  which  does  not  satisfy  the 
condition  j<m  thus  resetting  j  to  1.  This  also  frees  the  first  two  columns  and  Q| 
operation  can  be  performed.  Note  that,  the  Q  operation  lags  behind  the 
operations  by  two  steps.  For  example  when  and  Q2^  are  being  computed  for 
the  3-rd  iteration,  and  Q7  of  the  2-nd  iteration  are  being  executed.  Qg  and  Q7 
are  being  computed  for  the  last  iteration  of  one  matrix  while  Q|^  and  Q2^  are 
being  performed  in  the  first  iteration  of  a  new  matrix. 

When  i=m,  it  indicates  that  all  the  rows  of  the  matrix  have  been  covered 
for  the  operation  and  K  is  incremented  by  one.  As  long  as  k  <  x  (where  x  is 
the  predetermined  number  of  iterations),  the  process  is  continued  and  when  k>x, 
the  eigenvalues  are  transferred  to  the  next  stage.  To  check  the  convergence  of 
the  decomposition,  we  have  two  options. 

[1]  We  can  have  one  CORDIC  block  and  perform  unspecified  number  of 

iterations  until  it  converges. 

[2]  We  can  predetermine  the  number  of  iterations  x 
4.9.2:  Eigenvector  computation: 

This  is  similar  to  the  eigenvalue  computation  except  that  the  column 
operations  are  not  performed.  First  the  memory  is  initialized  to  0.  The  first  row 
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of  the  I  matrix  is  loaded  and  the  counters  are  initialized  to  i=l,  L=m-1  and  k=l. 
The  left  2X4  vvindow  on  rows  i  and  i+1  is  selected  and  the  next  row  of  the  input 
matrix  is  loaded  into  the  memory.  Next  the  right  2X4  window  on  rows  L  and 
L+1  is  selected  and  passed  to  the  processors  P4  to  P7  and  the  operation  is 
performed  by  the  angle  G,.]  which  is  received  from  the  processor  P^  through  a 
one  stage  FIFO.  Then  the  operation  is  performed  on  the  left  2X4  window  of  I 
by  the  angle  0,.  The  counters  L  and  i  are  incremented  by  one  and  the  conditions 
are  checked.  The  count  of  L  will  be  8  in  the  first  pass  which  is  not  less  than  m 
and  L  will  be  reset  to  1.  When  i  =m,  k  is  incremented  by  1  and  once  the  preset 
number  of  iterations  are  performed,  the  eigem^ectors  are  transferred  to  the  next 
stage. 


Note  that  L  lags  i  by  one  operation  meaning  that  when  i=l,  is  being  done 
on  the  left  2X4  window  of  rows  i  and  i+1  and  L=7  which  does  the  on  the 
right  2X4  window  of  rows  7  and  8  of  the  previous  iteration  or  the  last  iteration 
of  the  previous  matrix. 

The  CORDIC  Algorithm  is  an  attractive  option  considering  its  simplicity, 
accuracy  and  its  capability  for  high  speed  execution  via  parallel  processing.  It  is 
highly  flexible  and  is  programmable  in  the  sense  that  the  same  processor  (or  a 
similar  one)  can  perform  a  wide  variety  of  functions.  In  our  example,  the 
structure  of  the  processors  which  do  the  rotation  computation  and  the  angle 
computation  are  the  same  and  the  only  difference  is  that  a  few  of  the  input 
programmable  bits  are  different. 

Lange  et  al  (21)  have  presented  a  CMOS  CORDIC  processor  which  has  a 
throughput  of  100ns  with  16  bits  accuracy  and  has  72  inputs  and  67  outputs. 


T 
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Considering  the  degree  of  advancement  of  today's  CMOS  technology,  more  than 
100  CORDIC  processors  can  be  fit  into  one  single  chip  [25].  Depending  on  the 
area  and  speed  constraints,  we  can  either  select 

1]  One  CORDIC  processor  block,  which  performs  an  unspecified  number  of 
iterations  until  convergence  is  achieved.  In  this  case,  only  one  matrix  can 
be  processed  at  a  time  and  the  next  matrix  can  be  loaded  only  after  the 
current  matrix  has  converged  resulting  in  lower  throughput,  but 
consuming  less  area. 

2]  We  can  pre-specify  the  number  of  iterations  and  provide  one  CORDIC  block 
for  every  iteration.  Here,  pipelining  is  achieved  at  the  cost  of  area. 

Further  work  on  the  application  of  CORDIC  approach  to  MUSIC /ESPIRIT 
algorithm  is  in  progress. 


Chapter  V 


DOA  ESTIMATION  FOR  WIDEBAND  SOURCES 


In  the  previous  chapters  mainly  DOA  estimation  for  narrowband  sources  are 
described.  Narrowband  approaches  for  DOA  estimation  have  been  extended  for 
wideband  emitter  sources.  In  the  following  sections,  some  of  the  available  wideband 
DOA  algorithms  available  in  the  literature  are  discussed.  After  studying  these 
algorithms,  one  of  them  will  be  selected  for  further  study  to  develop  real  time 
hardware. 

Su  and  Morf  [27]  has  presented  an  approach  to  estimate  DOA  of  wide-band 
sources.  They  decompose  the  rational  spectra  of  the  emitters  into  elementary  modes 
characterized  by  their  poles.  The  location  estimates  are  derived  for  each  mode  using 
the  signal  subspace  approach.  It  is  claimed  in  their  work  that,  using  modal 
decomposition  signal  subspace  algorithm  asymptotically,  exact  location  estimates 
can  be  obtained.  In  their  algorithm  more  emitters  can  be  resolved  than  the  number 
of  sensors.  The  emitters  may  be  correlated  or  coherent  and  required  knowledge  of 
the  noise  spectra  is  also  minimal.  This  MDSS  algorithm  requires  the  following 
steps: 

1.  Estimation  of  the  multichannel  covariance  sequence. 

2.  Estimation  of  poles  by  overdetermined  Yule-Walker  method. 

3.  Estimation  of  residues. 

4.  Emitter  location  estimation  from  peaks  of  the  spatial  spectrum 

i.e.  application  of  MUSIC  algorithm. 
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The  previous  MDSS  algorithm  of  Su  and  Morf  has  been  extended  for  DOA 
estimation  for  wide-band  signals  using  the  ESPRIT  algorithm  by  Ottersten  and 
Kailath  [28].  Their  approach  is  similar  to  Su  and  Morf,  the  only  difference  is  that 
ESPRIT  algorithm  is  applied  instead  of  MUSIC.  Following  steps  are  required  for  this 
algorithm 

1.  Estimate  the  covariance  of  the  output  of  the  array. 

2.  Estimate  the  system  poles  and  residues. 

3.  Determine  the  dimension  of  the  signal  subspace  at  the  system 

poles  and  find  vectors  that  span  this  space. 

4.  Estimate  the  angles  of  arrival  using  ESPRIT. 

Wang  and  Kaveh  [39]  have  proposed  another  method  of  detection  and  estimating 
the  DOA  of  multiple  wide-band  sources.  Their  method  requires  computat  o:i  of 
discrete  Fourier  transform  on  the  output  of  the  array.  Covariance  is  estimated  in 
different  frequency  bins.  The  approximate  transformation  matrices  are  computed 
using  initial  estimates  of  DOA's  and  the  array  manifold.  The  signal  subspaces  at 
discrete  frequencies  are  estimated.  The  eigendecomposition  is  performed  and  then 
MUSIC  algorithm  is  applied  to  find  DOAs. 

An  extension  of  Wang  and  Kaveh  method  [40]  has  been  proposed  by  Shaw 
and  Kumaresan  [26].  They  used  a  simple  bilinear  transformation  matrix  and  the 
approximation  resulting  from  dense  and  equally  spaced  array  structure  has  been 
used.  This  is  done  to  combine  the  individual  narrowband  spectral  matrices  for 
coherent  processing.  This  method  is  non-iterative  and  initial  estimates  of  the  DOA 
are  not  required. 


This  method  of  DOA  estimation  of  wideband  signals  seems  very  attractive 
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and  was  easy  to  simplify.  A  pipelined  flowchart  of  this  method  is  shown  in  Figure 
5.1.  In  this  flowchart,  it  is  assumed  that  there  are  8  sensuiS.  First  of  all  data  is 
collected  from  all  the  sensors.  It  is  also  assumed  that  data  is  collected  for  64  segments 
and  each  segment  will  consist  of  64  samples.  FFT  is  performed  and  then  covariance 
matrices  are  computed  which  are  then  averaged.  G  and  Gn  are  computed  in  parallel 
each  requiring  two  matrix  multiplications.  A  Choleski  decomposition  is  performed 
to  convert  G  into  standard  form  which  will  make  eigendecomposition  little  easier. 
Eigendecomposition  can  be  performed  by  previously  discussed  Householder 
transformation  and  QR  method.  Using  eigenvalues,  number  of  sources  are 
estimated  and  then  finally  angle  of  arrival  can  be  estimated. 

At  this  time  we  are  further  studying  this  algorithm  in  order  to  develop  its 
architecture. 


Continued... 


Chapter  VI 


CONCLUSIONS 


The  work  performed  for  the  development  of  parallel  architecture  for  sei.sor 
array  algorithms  fo^"  the  last  six  months  has  been  described  in  cection  6.1  and  future 
work  is  given  in  sec.lon  6.2.  Further  results,  recommendations,  suggestions  and 
conclusior's  v/ill  be  presented  in  the  finai  report. 

6.2:WORK  PERFORMED 

1.  A  Literature  survey  has  been  performed  and  investigated  for  various 
algorithms  available  for  narrow  band  and  wide  band  cases. 

2.  In  the  area  of  narrow  band,  MUSIC  and  ESPRIT  algorithms  were  selected 
and  further  studied.  These  algorithm  were  converted  into  computationaly  efficient 
algorithms  and  subsequently  parallelized.  Three  different  architectures  namely 
Systolic  architectures,  Cordic  processors  and  SIMD  are  under  consideration. 

3.  These  algorithms  required  eigenvalue  decomposition.  Householder 
transformation  was  used  to  convert  covariance  matrix  into  tridiagonal  matrix.  The 
QR  method  was  used  to  finally  obtain  eigenvalues  and  eigenvectors.  The  detailed 
systolic  architecture  is  being  developed  for  parallelized  Householder 
transformations  and  QR  method. 

4.  Single  instruction  multiple  data  (SIMD)  type  of  architectures  lend 
themselves  for  the  implementation  of  narrow  band  DOA  estimation.  The 
computation  of  covariance  matrix.  Householders  transformation  and  QR  method 
con  be  easily  computed  using  SIMD  machine.  The  work  on  SIMD  machine  is  under 
progress. 
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5.  A  third  approach  of  utilizing  cordic  processors  is  also  being  investigated  for 
the  implementation  of  eigendecomposition. 

6.  In  the  case  of  wideband  DOA  estimations,  various  algorithms  available  in 
the  literature  were  studied.  It  was  found  that  wideband  DOA  estimation  is  more 
computationaly  intensive  than  narrow  band  case.  An  algorithm  proposed  by  Shaw 
has  been  selected  for  further  study.  This  algorithm  again  has  been  modified  and 
substituted  with  computionally  efficient  operations.  A  parallelized  flowchart  has 
been  developed.  A  block  diagram  for  its  hardware  has  also  been  developed.  Other 
available  algorithms  are  also  under  investigation. 

6.3:  FUTURE  WORK 

1.  Detailed  architecture  design  will  be  done  for  both  MUSIC  and  ESPRIT  algorithms. 

2.  Interface  modules  will  be  designed  to  transfer  data  from  one  computational 
module  to  another. 

3.  High  level  simulation  will  be  performed  for  MUSIC  and  ESPRIT  to  check  finite 
precision  effects  and  finalize  word  length  for  various  modules. 

4.  A  simulation  will  be  performed  for  the  architecture  itself  using  VHDL  language 
of  Workview  4.0  software  from  Viewlogic. 

A  detailed  architecture  will  also  be  developed  for  the  wideband  case. 

6.  An  algorithm  equivalent  to  QR  method  is  being  investigated  which  may  be  more 
efficient  than  the  presently  proposed  algorithm. 

7.  Modifications  to  ESPRIT  algorithm  are  being  studied. 

8.  A  study  will  be  performed  to  estimate  real  time  requirements  for  the 
computations  of  DOA.  Based  on  this  study,  real  time  architecture  for  the 
computation  of  MUSIC  and  ESPRIT  will  be  proposed. 
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